Archive for August, 2008

Search experiments at Google

Wednesday, August 27th, 2008

As if owning one zeitgeisty domain wasn’t sufficient, it seems that this one, or the phrase from which it is formed, is now in fashion, following a Google blog post about search experiments there

As the SEO blogs link to and comment about it, the phrase “search experiments” becomes more popular and more competitive – at time of writing, this site has been pushed down on to page 2 (position #11).

The Google post is both interesting and funny. It kicks off with two versions of part of a results set so similar that it is impossible to tell them apart without placing them side by side, and even then it is a struggle. It reminded me obscurely of the Fast Show’s Animation Now sketch, where he moves things “just a tiny bit”.

The difference between the two is an extra half-millimetre of white space around one of the results. I suppose that they didn’t get where they are today by saying at any point “ah, that’s good enough”, but the degree of attention to detail seems beyond obsessive. The poster, Ben Gomes, even refers to the changes as “barely visible”.

It’s interesting to see however that as well as experimenting with new features and products, they are always tweaking the main model. If only they paid as much attention to their algo! (Joke, but I know a few webmasters who would be laughing bitterly…)

Type-in and optimisation

Wednesday, August 27th, 2008

The other night, I found myself idly pursuing one of my low-energy hobbies, that of checking domain names for availability. Following the release from a Vietnamese prison of disgraced former glam rock star Gary Glitter, who was a big star in the UK in my youth, I wondered who owned the domain. 

Well, the non-hyphenated version has been sat on by Internetters, one of the big domain services, but the hyphenated version – www.gary-glitter.com – was available.

I bought it and pointed it initially on to a page on this site, without linking to it from anywhere, and put a header and holding text on it.

In three days it had 38 page views, 28 uniques and 27 entrances. Three or four of those would have come from my showing people that I owned the domain, but the rest would have been type-ins by third parties. To be honest, I’m not sure whether I think this is a high number or not. I don’t know anyone who just types domains to see what is there, and I have a suspicion that most of the money that is made from type-in is from misspelled domains.

However, the man wasn’t out of the news for those days and remains all over the redtops at time of writing, which might lead you to expect more.

The domain is no longer redirected to this site, and has a holding page with ads, and a linked blog.

These terms only appear in links pointing to this page

Tuesday, August 19th, 2008

OK, so we have an initial result for our second search experiment. This was a five-page experiment, with a home page that linked to two further pages, one with meta robots set to index, the other set to noindex. Each of these pages linked to a destination page, the links having the same anchor text (which was a unique, or at least unusual, portmanteau word). The anchor text did not appear anywhere else on any pages.

The expectation was that both destination pages would be indexed. I also expected both pages to appear in a search for the anchor text word, but I wasn’t absolutely sure about this. Then, if both destination pages did indeed appear for that word, I was interested to see which ranked better.

It took longer than I expected for all the pages to be indexed, but both destination pages made it in there eventually. The linking page that was set to noindex, of course, is not there.

The indexed linking page is the first result from the site for the anchor text term. This page contains the term, and is further up the site hierarchy. The second result from the site is the second destination page (ie the one linked from the noindex linking page). Google’s cache of that page contains the familiar phrase: “these terms only appear in links point to this page”, followed by the anchor text. 

The first destination page does not appear in the results. It may do in future, and if it does I will report on its relative performance. But the page linked from the noindex parent was first to show…

This result demonstrates that pages set to noindex are passing link anchor text. This should not be too much of a surprise. From the initial result, it might appear that it is doing so more efficiently that an indexed page. I think that conclusion would not be correct. However, it might be reasonable to assume that it is passing anchor text at least as well as an indexed page.

Some further questions arise:

  • does Google consider the non-linked textual content of a non-indexed page when determining the relevance of the links from that page?
  • Indeed, does Google treat “noindex” pages exactly the same as other pages in its index – assessing the content, placing them in the link graph etc – and the only difference is that pages are not returned in SERPs?
  • What difference would there be if the page rather than locally set to noindex had been excluded using robots.txt?
  • Is it a given that the page containing the anchor text link would rank higher for the phrase than the page linked to, if the page linked to did not itself contain the word? Or in other words, does textual content outrank anchor text?

I don’t think that last one can be true, and I feel another experiment coming on…

Ranking prediction: result

Sunday, August 17th, 2008

As George Costanza used to say, “I was wrong”. This week I’m wrong about the effect of the so-called powerful external link that I mentioned before

It turned up in Webmaster Tools as a credited link to the site. Did it make any difference as to the order of the blog home and site home in the Google rankings for the “search experiments” query? As I predicted? No, it did not. The blog home still sits there as the first result, with the home page indented.

I guess that with all the cross-linking, those two pages may well have similar rank, and the blog home page is more relevant in terms of its content.

Effects of taggregation, plus status updates

Friday, August 15th, 2008

I am a little surprised to find that the blog home page was briefly #2 (now #3) in Google for the phrase “search experiments”, and that the site home page is #2 in Yahoo (in each case, the UK varieties). Despite this apparent “success” (I don’t think that the term has driven any search visitors to the site), there remain pages of the site resolutely unindexed.

Google

The preference that Google is showing for the blog home page is also interesting, and it is worth looking into why this might be, particularly because the links that I have created are all to the website home page. Although all the pages on the site link to the blog home, all the pages/posts on the blog link to the site home. 

So what is going on with Google here? A link: operator search returns no results, but Webmaster tools credits the site overall with 39 external links. Eight of these are to the home page, the rest to blog pages. The eight, which I set up, are from a couple of other blogs, one of which is totally weak and the other fairly weak.

The links to blog pages are mostly from Technorati, and all Technorati links are from pages aggregating all blogs with particular tags. The other links look as if they are doing something similar, probably with material taken or scraped from Technorati.

There’s good cross-linking between the blog and the other site pages: all links on blog pages to the main site home page use the phrase; conversely, all links on the non-blog pages link to the blog including the phrase. 

So, crosslinking should pretty much cancel itself out in relation to relative ranking. Which leads to an interesting tentative hypothesis: that simply blogging and using tags can garner external links – from aggregator pages – that are as powerful as hand-edited links from existing sites.

I do have one reasonable powerful incoming link set up (from the home page of a five-year old site with thousands of organic links), but this is not yet showing up as an external link in Webmaster tools. (This link is to the home page, not the blog.)

OK, it could of course be passing PR without showing up in Webmaster tools. I shall keep an eye out to see whether the relative ranking changes, and when the link shows up in Webmaster tools.

Yahoo

In Yahoo, it’s the home page that is showing up in the rankings. The blog home page is nowhere to be seen in the rankings; indeed, Site Explorer doesn’t recognise the page among the six that it currently lists. 

However, Site Explorer is giving credit for the one relatively powerful link to the site.

Observations and predictions

1) The blog home page being “ahead” of the home page in Google rankings seems to suggest that the links garnered by tag aggregation – I am disappointed but not wholly surprised to discover that the word “taggregation” has already been coined – may have a significant role to play in getting content indexed and ranked. I will not put it more strongly than that at present. It may be worth experimenting with a new blog, unlinked elsewhere, to test this hypothesis – by watching how it performs up to the point that someone manually links to it.

2) Having a top 3 result for a plausible if specialised phrase does not necessarily generate traffic.

3) Google is more interested in blog content than Yahoo (?)

Prediction: when Webmaster Tools shows the strong site in the external links, the home page for the site will outperform the blog home page in Google. 

Thinking about it, the other possible reason that the blog home page may be outperforming the home page is content – there’s typically a lot more content on the blog page and (obviously enough) the phrase “search experiments” gets mentioned all the time on it.

Bad assumptions cause incorrect conclusions…

Monday, August 11th, 2008

Hmm. In my last post I suggested that I had reached a conclusion about the CSS and div-related image indexing test. I might have done, but I think that I jumped there. 

The original motivation for the test was to work out why certain images were not being indexed. Two hypotheses presented themselves – some slightly sloppy nesting of divisions, and a clear CSS hack.

Sure enough, when the various relevant pages were indexed and cached in Google, I found what I was looking for – that some of the pages didn’t appear to show the images in the cache. This would also tend, I thought, to support the hypothesis that the pictures not appearing here would be excluded from the image search results – because Google, having “refused” to cache the images on the page, would surely “refuse” again to include them in the index.

A nice enough hypothesis – and having been pleased with myself for devising it, of course I wanted it to be true, so started looking for results that would confirm it.

Those pages showed up with no images, and I published my immediate conclusions. However, I was looking at the cached pages in Firefox and Safari. Today I took a look using another browser that I rarely use, Internet Explorer. Using this browser, the images were visible. Google hasn’t “refused” to cache them. The hypothesis seems much weakened. 

I’ll continue to track what happens to these pages and the images on them, and report back. However, an important lesson has been learned from the experiment in any case: do not allow your desire to be correct skew your interpretation of the results that are returned.

Bad CSS to blame for non-caching of images

Sunday, August 10th, 2008

The first SEO experiment on the main site was intended to determine which of two possible code faux pas was more likely to be the cause of images not showing up in Google’s Image search results, a problem that had occurred on another site – which is why the test was a little specific in nature, and not very generic.

On examining Google’s cache of the pages in question, it was clear that the main images on those pages were not appearing. Looking at the code, two possible culprits suggested themselves. 

Firstly, in a rather messy way, classes and ids were being used interchangeably as style selectors for divisions (“divs”), and although there were not any repeated ids, there was a div with a particular id, which was then referenced as a class in another, nested div.

<div id=”blah”>

<div class=”blah”>

[picture and other content]

</div>

</div>

Not invalid HTML, but messy.

The other candidate was some strange-looking CSS code, apparently designed to get over some problem with rendering in IE6 (which may itself have been caused by the messy HTML…)

.hack {
	color: blue;
	font-size: 18px;
	height: 1%;
	overflow: hidden;
	}

It’s the last two lines, obviously, that are the candidates for causing issues. This CSS validates, and the pages render as expected in all browsers that I have tried. Browsers are very forgiving, however…

So, I recreated pages with these problems, including controls and permutations with the different errors.

The conclusion is that it is the CSS hack that is causing the images not to render in Google’s cache. It’s too early to tell whether this is also having an effect on the indexing of these images, because none of the images is yet indexed.

The cache for the badly nested divs page shows the picture, whereas the cache for the CSS-hacked test page does not render the picture.

Does this mean that Google is excluding certain types of “hidden” content, or does it mean that its internal “browser” for rendering its cached pages is a bit more strict about rendering pages accurately? Only when the pages have settled in the index and the images on the test pages have made it (or not) into the image search results will we be able to speculate more intelligently on this.

Ranking versus indexing – update

Friday, August 8th, 2008

Back after a week or so’s total inattention, and an interesting pattern is emerging. With the power of external links beginning to kick in, the home page for the site is now ranking at #5 in the big G for the familiar phrase. However, all of the non-blog pages appear to have disappeared from the index – at least, those few that were there already. None are currently indexed.

I suspect that this is part of a general fluctuation common with new sites, but I’d make the following observations. 

  1. The blog pages aren’t affected by this. Those that are set to be indexed are there in the big G index.
  2. Since the last update I’ve introduced an XML sitemap with all URLs in there (which updates with any new blog posts). So far no beneficial effect for non-blog pages.
  3. Some of the inbound links are showing in G webmaster tools now, but I suspect that this ranking means that all have been taken into account. The most powerful link is on a page that has now been crawled since the link was introduced.
  4. None of the links is from a page relevant to the subject matter. 
  5. Yahoo is not indexing anything but the homepage at present. Site Explorer is however showing links to pages other than the home page. I’m not quite sure how Y can count links to pages that it doesn’t recognise as indexed.
No firm conclusions as yet. Tentative conclusions are:
  • Using a blog platform is better for getting your pages indexed than hand-crafting HTML
  • Possibly this effect is helped by tagging, as other sites collate, aggregate and link to posts based on tagging
  • It’s easier to get one page to rank for a search term than it is to get a suite of pages into the index
However, at present it is also making it difficult to draw conclusions about either the CSS/picture experiment or the anchor text/noindex experiment.

Sitemap generator and update

Saturday, August 2nd, 2008

Finally worked out how to use the excellent XML (Google) Sitemap Generator plug-in. What I mean is, I had some problems adding non-blog pages, mainly because I’m an idiot – I was adding pages and then updating the sitemap, when I should have been adding pages, updating the options and then updating the sitemap.

Hoping that this will help to get some more non-blog pages indexed, so that the experiments can proceed.

In the meantime, the home page is now ranking #13 in Yahoo’s UK search for the term “search experiments”. It’s currently the only page from the site or blog that Yahoo currently has indexed.

The purpose of this site and blog is not to rank for the phrase, which I don’t think will generate any significant traffic. However, it isn’t a nonsense phrase. It is intriguing to see how much easier it is to rank for a single, albeit non-competitive, phrase, than it is to get all pages in a site into the index.

It’s particularly interesting to see how much more quickly blog posts are indexed when compared with hand-coded pages.

Taking a few days out now. The sitemap is set, the external links are in, the comment spam-buster is installed. We’ll see how things fare when I return from holiday…

Unlucky for some

Friday, August 1st, 2008

Status

  • Main home page now #13 for “search experiments” on Google. 
  • Most sub-pages on main site not yet indexed.
  • New links from third party sites not in cache now – not sure if they are helping yet or not.
  • Blog pages in and out of Google index – still waiting for the preferences set in the All in One SEO plugin to settle down I think.

 

Plans

  • Waiting on existing pages to be indexed before I can conclude the two experiments.
  • New experiments will need site to be indexed too.