Posts Tagged ‘CSS’

Bad assumptions cause incorrect conclusions…

Monday, August 11th, 2008

Hmm. In my last post I suggested that I had reached a conclusion about the CSS and div-related image indexing test. I might have done, but I think that I jumped there. 

The original motivation for the test was to work out why certain images were not being indexed. Two hypotheses presented themselves – some slightly sloppy nesting of divisions, and a clear CSS hack.

Sure enough, when the various relevant pages were indexed and cached in Google, I found what I was looking for – that some of the pages didn’t appear to show the images in the cache. This would also tend, I thought, to support the hypothesis that the pictures not appearing here would be excluded from the image search results – because Google, having “refused” to cache the images on the page, would surely “refuse” again to include them in the index.

A nice enough hypothesis – and having been pleased with myself for devising it, of course I wanted it to be true, so started looking for results that would confirm it.

Those pages showed up with no images, and I published my immediate conclusions. However, I was looking at the cached pages in Firefox and Safari. Today I took a look using another browser that I rarely use, Internet Explorer. Using this browser, the images were visible. Google hasn’t “refused” to cache them. The hypothesis seems much weakened. 

I’ll continue to track what happens to these pages and the images on them, and report back. However, an important lesson has been learned from the experiment in any case: do not allow your desire to be correct skew your interpretation of the results that are returned.

Bad CSS to blame for non-caching of images

Sunday, August 10th, 2008

The first SEO experiment on the main site was intended to determine which of two possible code faux pas was more likely to be the cause of images not showing up in Google’s Image search results, a problem that had occurred on another site – which is why the test was a little specific in nature, and not very generic.

On examining Google’s cache of the pages in question, it was clear that the main images on those pages were not appearing. Looking at the code, two possible culprits suggested themselves. 

Firstly, in a rather messy way, classes and ids were being used interchangeably as style selectors for divisions (”divs”), and although there were not any repeated ids, there was a div with a particular id, which was then referenced as a class in another, nested div.

<div id=”blah”>

<div class=”blah”>

[picture and other content]

</div>

</div>

Not invalid HTML, but messy.

The other candidate was some strange-looking CSS code, apparently designed to get over some problem with rendering in IE6 (which may itself have been caused by the messy HTML…)

.hack {
	color: blue;
	font-size: 18px;
	height: 1%;
	overflow: hidden;
	}

It’s the last two lines, obviously, that are the candidates for causing issues. This CSS validates, and the pages render as expected in all browsers that I have tried. Browsers are very forgiving, however…

So, I recreated pages with these problems, including controls and permutations with the different errors.

The conclusion is that it is the CSS hack that is causing the images not to render in Google’s cache. It’s too early to tell whether this is also having an effect on the indexing of these images, because none of the images is yet indexed.

The cache for the badly nested divs page shows the picture, whereas the cache for the CSS-hacked test page does not render the picture.

Does this mean that Google is excluding certain types of “hidden” content, or does it mean that its internal “browser” for rendering its cached pages is a bit more strict about rendering pages accurately? Only when the pages have settled in the index and the images on the test pages have made it (or not) into the image search results will we be able to speculate more intelligently on this.