[wp-hackers] Canonical integration into core

Tue Feb 17 10:12:28 GMT 2009

2009/2/17 Otto <otto at ottodestruct.com>

> On Mon, Feb 16, 2009 at 8:51 PM, Lynne Pope <lynne.pope at gmail.com> wrote:
> > Amazing too that the reaction to opposition to the proposal should be
> > comments that I don't know what I am talking about.
>
> Well, until you actually explain what you're talking about, you seem
> to be spouting gibberish.
>
> I do want give you the benefit of the doubt, but you really need to
> explain things like this further:
>
> > The inclusion of the canonical URL link in the core will cause problems
> for
> > a number of users. For example, where I have ported content from other
> apps
> > I have retained the URL through the HTTP/1.1 standard for content
> location.
> > I know of a number of other sites who also use this specification - which
> > will conflict with the Google canonical URL implementation.
>
> That needs a better explanation. Really. Because it doesn't make any sense.

Sorry if you don't understand.  But there's a wealth of information already
out there about content-location and how to keep a URI that does not change
regardless of the underlying software platform.

> What, exactly, will having a link rel=canonical do to your site that
> will harm SEO? What will it do to a normal WordPress site? Please
> explain to us what the heck you're talking about, so that we can
> understand what you're getting at.

It will do nothing to my site if it can be removed through an action or
filter ;)
It will also do little or nothing to negatively impact on a new site either.
However, older sites need to maintain control over what gets presented as a
canonical URL or they will run into issues with indexing. An automated link
in the header will not give sites the control they need to direct page rank
and link juice the way they want.

For example, one site I developed was using categories as a key part of the
site structure. Posts were accessed by category and were not listed in an
index or home page. A standard blog would normally have category -> post as
duplicates to index -> post but, in this case, all posts were accessed only
through a category listing.

A generic header tag that assumes sites are blogs that present either full
posts or excerpts on a frontpage is going to be unusable in situations like
this. As more sites trend towards using WP as a CMS we will see more sites
deviating from the norm.

> I submit that every WordPress based site, which has not been heavily
> customized, can benefit from this tag. Mainly, the new comments paging
> creates duplicate content pages in the singular post/page sections
> which were not there before.

I submit that this is a problem then with the new comments paging, which
should be fixed if possible. Not every search engine or indexing service
supports the canonical tag and relying on the tag alone to mitigate
duplicate content issues caused by code is probably not the best way to
handle it.

 > The new Google tag is designed for duplicate content. WordPress already
> > provides many ways of ensuring duplicate content is not indexed.
>
> No. The canonical tag is indeed designed for duplicate content, but it
> is not designed to prevent duplicate content from being indexed. It's
> to keep it indexed under a single URL.

Google have, themselves, stated that the duplicate content which directs the
robot to a canonical URL will be removed from the results.

> The tag is NOT designed to be in the header of every page.
>
> Yes, in point of fact, it is. It is designed exactly for that. Or, at
> least, to be in the header of pages that can have multiple URLs which
> contain the same content. And with 2.7 and paged comments, that's
> every single post page.

Please check your facts. I don't want to find myself under personal attack
for simply restating what Google and Yahoo! have already said. They have
made it clear that the tag is designed to go into the headers of the
duplicate content, to point the search engines to the canonical content.

While Matt Cutts has stated that he believes there should be no problem if
the canonical page has a link pointing to itself, he has also warned that
people should be very careful. Yahoo, on the other hand, is advising that
the link should go only into pages that are duplicates.

Remember that so far they have only done tests and the mass roll-out is only
just happening. Because of the removal of duplicate content from search
results I stand by my assertion that introducing it into the core - as a
default - has the potential to hurt a sites SEO.

I simply do not see how WP code (or any code really) can discern which
content is duplicate and subordinate to other content. (With the exception
of paged comments, where clearly the comment pages are subordinate to the
parent comments page). Search engines do a fairly good job but are not
fool-proof.

However, as I said, I have no problem with it going into the core as long as
it is able to be easily disabled. After all, I don't get any flack if sites
lose a bunch of content from search engine results ;)

Lynne