Posts Tagged ‘image search’

Update on experiment 1

Wednesday, September 3rd, 2008

It transpires that the whole experiment was somewhat misconceived.

To recap: we were having trouble getting images indexed on a certain part of another live site, and on examining the cache of the pages in Google we noted that the images were not appearing. We then identified a couple of candidate reasons why this might be, isolated them and set up some pages here to test which of the reasons might be causing it. 

We successfully identified the cause of the phenomenon.

However, the underlying assumption – that the absence of the image from Google’s cached version was somehow an indication that Google had not indexed the image – was incorrect, as I discovered when looking again at an offending cached page using another browser (in this case IE), which rendered the image.

I suppose that the experiment worked, but the hypothesis unfortunately died.

Bad CSS to blame for non-caching of images

Sunday, August 10th, 2008

The first SEO experiment on the main site was intended to determine which of two possible code faux pas was more likely to be the cause of images not showing up in Google’s Image search results, a problem that had occurred on another site – which is why the test was a little specific in nature, and not very generic.

On examining Google’s cache of the pages in question, it was clear that the main images on those pages were not appearing. Looking at the code, two possible culprits suggested themselves. 

Firstly, in a rather messy way, classes and ids were being used interchangeably as style selectors for divisions (”divs”), and although there were not any repeated ids, there was a div with a particular id, which was then referenced as a class in another, nested div.

<div id=”blah”>

<div class=”blah”>

[picture and other content]

</div>

</div>

Not invalid HTML, but messy.

The other candidate was some strange-looking CSS code, apparently designed to get over some problem with rendering in IE6 (which may itself have been caused by the messy HTML…)

.hack {
	color: blue;
	font-size: 18px;
	height: 1%;
	overflow: hidden;
	}

It’s the last two lines, obviously, that are the candidates for causing issues. This CSS validates, and the pages render as expected in all browsers that I have tried. Browsers are very forgiving, however…

So, I recreated pages with these problems, including controls and permutations with the different errors.

The conclusion is that it is the CSS hack that is causing the images not to render in Google’s cache. It’s too early to tell whether this is also having an effect on the indexing of these images, because none of the images is yet indexed.

The cache for the badly nested divs page shows the picture, whereas the cache for the CSS-hacked test page does not render the picture.

Does this mean that Google is excluding certain types of “hidden” content, or does it mean that its internal “browser” for rendering its cached pages is a bit more strict about rendering pages accurately? Only when the pages have settled in the index and the images on the test pages have made it (or not) into the image search results will we be able to speculate more intelligently on this.