Posts Tagged ‘noindex’

Canonicals and noindex results

Sunday, November 1st, 2009

The results of the third experiment are in. This was quite a simple one: to see whether Google would respect a canonical link element on a page that had the noindex robots metatag.

No surprises here, happily. You’d expect Google to read the noindexed page, including the canonical link element, make the adjustment accordingly and index the destination page. That’s exactly what it did. Happy days.

Actually, it did it so quickly (compared with some other canonicals that I’ve implemented elsewhere) that I’m left wondering whether Google might actually be more inclined to pay swift attention to the canonical instruction if the page on which it is found is set not to be indexed. Just speculation, of course.

Canonical link element and noindex robots metatag

Tuesday, October 20th, 2009

I’ve actually explained what I’m doing in this experiment on the page itself, which is here. The set-up is as follows

  • Create two almost identical pages
  • Link to the first one
  • Set the first page to “noindex,follow”
  • Give the first page a canonical link element in the head section, pointing to the second page
  • Set the second page to “index, follow”

Then, sit back and wait for Googlebot to work its magic – and see whether the second page makes it into the index. Really, provided that Google respects the noindex tag, and there’s no good reason why it should not, there should be no chance of the first page making it into the index. So the sole question is whether the second page will make it into the index or not.

My expectation, and hope, is that it will, despite being unlinked from anywhere else. Further variations on this theme will follow if it does not, and may in any case.

These terms only appear in links pointing to this page

Tuesday, August 19th, 2008

OK, so we have an initial result for our second search experiment. This was a five-page experiment, with a home page that linked to two further pages, one with meta robots set to index, the other set to noindex. Each of these pages linked to a destination page, the links having the same anchor text (which was a unique, or at least unusual, portmanteau word). The anchor text did not appear anywhere else on any pages.

The expectation was that both destination pages would be indexed. I also expected both pages to appear in a search for the anchor text word, but I wasn’t absolutely sure about this. Then, if both destination pages did indeed appear for that word, I was interested to see which ranked better.

It took longer than I expected for all the pages to be indexed, but both destination pages made it in there eventually. The linking page that was set to noindex, of course, is not there.

The indexed linking page is the first result from the site for the anchor text term. This page contains the term, and is further up the site hierarchy. The second result from the site is the second destination page (ie the one linked from the noindex linking page). Google’s cache of that page contains the familiar phrase: “these terms only appear in links point to this page”, followed by the anchor text. 

The first destination page does not appear in the results. It may do in future, and if it does I will report on its relative performance. But the page linked from the noindex parent was first to show…

This result demonstrates that pages set to noindex are passing link anchor text. This should not be too much of a surprise. From the initial result, it might appear that it is doing so more efficiently that an indexed page. I think that conclusion would not be correct. However, it might be reasonable to assume that it is passing anchor text at least as well as an indexed page.

Some further questions arise:

  • does Google consider the non-linked textual content of a non-indexed page when determining the relevance of the links from that page?
  • Indeed, does Google treat “noindex” pages exactly the same as other pages in its index – assessing the content, placing them in the link graph etc – and the only difference is that pages are not returned in SERPs?
  • What difference would there be if the page rather than locally set to noindex had been excluded using robots.txt?
  • Is it a given that the page containing the anchor text link would rank higher for the phrase than the page linked to, if the page linked to did not itself contain the word? Or in other words, does textual content outrank anchor text?

I don’t think that last one can be true, and I feel another experiment coming on…