WP-o-Matic and “linkbuilding”

October 6th, 2008

Following my experiment yesterday – downloading RSS feeds about US politics, then replacing and autolinking certain words, it seems that a few repetitive entries on an obscure blog can get picked up.

I put a stupid, made-up phrase to replace the name Sarah Palin – scaryspiceus – and linked every instance of it to Palin’s Wikipedia page. 

Unsurprisingly, yesterday there were no results for that phrase. Today, the wikipedia page comes up as the sole result – that is to say, the blog posts themselves don’t show up for it.

I’ll keep an eye on this and report any changes.

WP-o-Matic – early experiences

October 5th, 2008

I’ve been having some fun playing around some more today with this plugin. The first thing that I should say is thank you Guillermo, as it really has been enjoyable. 

I’ve successfully managed to do what I suggested would be possible earlier, which was to take a specific RSS feed – I created a BBC news feed based on the phrase “Barack Obama”, and as the news items arrived, I set the plugin to find the full names of Obama, McCain and Palin, to change them into some stupid words/phrases, and to make each one of them a link to another site. 

You can see an example blog post here. As you can see, this could very easily be abused. The beauty of swapping out names and nouns is that with minimal effort you can ensure that you are sufficiently different to avoid duplicate content penalties, but also, because the original content is likely to make good semantic sense, the variant version will also look like proper English to a machine, even if it might look odd to a real person.

What could you do with this?

Here’s an example. Say you decided, on the basis of this old data, that you wanted to pick up some of the traffic that might be attributable to the most popular misspelling of “Britney Spears”.

Well, you could mash up whatever feeds you could find about Britney Spears, change all the instances of Britney to Brittney (or Brittany), pick on some other likely words to occur and change them as well, to try to avoid being seen as a duplicate – so “chanteuse” replaces “singer”, “fling” replaces “marriage” and so on… You could also link all instances of the full misspelling back to your main target page. The possibilities are endless, although one might argue that you would not be adding much to the richness of human experience.

I have had a few issues. Whatever setting I used to control dates (ie whether I gave precedence to the feed’s dates or not) I found that importing my delicious feed put the older posts at the top on the initial load. Once it was in there, and I added a new bookmark, the new story appeared in the correct place at the top as I would have expected.

Feeds from Yahoo Pipes seem not to work yet, which someone has already reported. I’ve also had some problems sorting out automatic updating using Cron, but that’s nothing to do with the plugin.

It’s worth noting that this plugin is currently a release candidate, which means that it still has a few issues to be ironed out. Good luck with getting to a full first release.

Splogging and search

October 5th, 2008

I’ve been experimenting with the Wordpress plugin WP-o-Matic on another blog of late. In combination with the SimplePie plugin, it allows you to automatically post to blogs using RSS feeds. 

The plugin allows you to create campaigns, into which you can place multiple RSS feeds – or just a single one if you prefer. For each campaign, you allocate a category, and the plug-in will post items from the feed as individual blog posts categorised accordingly. 

You can control how often each campaign checks the feed for new items, although I’ve had some teething problems getting this to work exactly as I would like. Ideally, you would want to organise this so that it published stories on a drip-feed basis pretty close to their publication dates, so you want to set the check time at about the same frequency as new items are published.

Incidentally, I’ve also had some difficulty getting the campaigns to refresh. I think it is something to do with being a bit new to cron jobs. More on that later.

So, why would you want to republish someone else’s RSS feeds as if they were your own blog posts? Isn’t this (a) a rather unethical theft of content and (b) unlikely to do you any good for search optimisation, as it will all be duplicate content?

I’ll leave the ethical questions for another time – for now, let’s just remember that the second S in RSS stands for “syndication”.

So, what possible benefits, including SEO benefits, could flow from republishing this material? The idea of each item in an RSS feed being reproduced as a new, individual post is definitely just dupe content spam, right?

Not really. There are all kinds of possible legitimate uses for this. For example, you might want to do some judicious selection of RSS feeds, perhaps filtered automatically as well, and combine them so that your particular blog carried every story that you thought was going to be of interest to your audience. Provided that the posts have links to the original story, your users could be reading the truncated RSS summary in your blog and then deciding whether to go to the full post.

Another possibility is that you effectively own the RSS feed – for example, it could be something like your del.icio.us feed, which you wanted to turn into a linkblog without doing any more work, but creating a post for each one.

However, from an SEO point of view there are some further uses.

First, although the posts themselves will not be unique, the permutation of them may well be, so that your main page – and in particular your category pages – can contain themed content in a combination that is not to be found elsewhere on the web. If reasonably well-linked, these pages could have a chance of ranking for those terms.

Second, there is a very nice feature in the plug-in that allows you to process the feeds as they come in using a search and replace function.

This is separated into two functions for ease of use: the first is a simple word-swap. The example that the author gives is that you could have the plugin search for “ass” and replace it with “butt”. Incidentally, this kind of auto-bowdlerisation is a risky business – witness the embarrassment of the right-wing Christian site that decided that “gay” was too euphemistic (and happy-sounding) for them, and then ended up publishing a number of stories about the Olypmic sprinter “Tyson Homosexual”.

The second element enables you to automatically place links behind certain specified words/phrases. This is obviously pretty powerful for building lots of links with the right anchor text, quite quickly.

I’m not sure whether the two would work together – I will give it a go – but on the assumption that they do, it would be possible to pick a news feed filtered on say, Barack Obama, and republish all of those stories with the words “digital cameras” automatically replacing “Barack Obama”, and linking to your digital cameras site. You might even avoid some of the duplicate-spotting in this way…

Warning: very much of any of this kind of stuff is pretty likely to get your site banned by Google.

Affiliate experiments

September 22nd, 2008

I’ve signed up with AffiliateWindow (and, yes, that is an affiliate link!) as an experiment, pretty much to see how the process works and to test out their various tools. 

I don’t expect my own sites to generate any income whatsoever. I’ve made a few hundred pounds in the past with my old poker podcast site, but that was the result of doing a regular podcast (now pretty much defunct) with quite a lot of listeners, whom we occasionally encouraged by various active means to sign up via our affiliate link (although it wasn’t a profit-led enterprise by any means…)

The reason for doing this here is that I wanted to check out the interface and the various options. So far, it’s pretty much plain sailing. I’ve gone from application through approval to successfully putting the first ads across the pages of the main site, which I’ve done by placing it in the same code as an existing include file. The form took two minutes to fill in, the approval arrived inside an hour. 

(I’m guessing that the application standards aren’t too strict, as I’m sure they have ways of monitoring affiliates using practices of which they disapprove, so they can weed out any abusers later.)

I’ll be posting here about my experiences with the interface and the various services on offer.

Update on experiment 1

September 3rd, 2008

It transpires that the whole experiment was somewhat misconceived.

To recap: we were having trouble getting images indexed on a certain part of another live site, and on examining the cache of the pages in Google we noted that the images were not appearing. We then identified a couple of candidate reasons why this might be, isolated them and set up some pages here to test which of the reasons might be causing it. 

We successfully identified the cause of the phenomenon.

However, the underlying assumption – that the absence of the image from Google’s cached version was somehow an indication that Google had not indexed the image – was incorrect, as I discovered when looking again at an offending cached page using another browser (in this case IE), which rendered the image.

I suppose that the experiment worked, but the hypothesis unfortunately died.

Search experiments at Google

August 27th, 2008

As if owning one zeitgeisty domain wasn’t sufficient, it seems that this one, or the phrase from which it is formed, is now in fashion, following a Google blog post about search experiments there

As the SEO blogs link to and comment about it, the phrase “search experiments” becomes more popular and more competitive – at time of writing, this site has been pushed down on to page 2 (position #11).

The Google post is both interesting and funny. It kicks off with two versions of part of a results set so similar that it is impossible to tell them apart without placing them side by side, and even then it is a struggle. It reminded me obscurely of the Fast Show’s Animation Now sketch, where he moves things “just a tiny bit”.

The difference between the two is an extra half-millimetre of white space around one of the results. I suppose that they didn’t get where they are today by saying at any point “ah, that’s good enough”, but the degree of attention to detail seems beyond obsessive. The poster, Ben Gomes, even refers to the changes as “barely visible”.

It’s interesting to see however that as well as experimenting with new features and products, they are always tweaking the main model. If only they paid as much attention to their algo! (Joke, but I know a few webmasters who would be laughing bitterly…)

Type-in and optimisation

August 27th, 2008

The other night, I found myself idly pursuing one of my low-energy hobbies, that of checking domain names for availability. Following the release from a Vietnamese prison of disgraced former glam rock star Gary Glitter, who was a big star in the UK in my youth, I wondered who owned the domain. 

Well, the non-hyphenated version has been sat on by Internetters, one of the big domain services, but the hyphenated version – www.gary-glitter.com – was available.

I bought it and pointed it initially on to a page on this site, without linking to it from anywhere, and put a header and holding text on it.

In three days it had 38 page views, 28 uniques and 27 entrances. Three or four of those would have come from my showing people that I owned the domain, but the rest would have been type-ins by third parties. To be honest, I’m not sure whether I think this is a high number or not. I don’t know anyone who just types domains to see what is there, and I have a suspicion that most of the money that is made from type-in is from misspelled domains.

However, the man wasn’t out of the news for those days and remains all over the redtops at time of writing, which might lead you to expect more.

The domain is no longer redirected to this site, and has a holding page with ads, and a linked blog.

These terms only appear in links pointing to this page

August 19th, 2008

OK, so we have an initial result for our second search experiment. This was a five-page experiment, with a home page that linked to two further pages, one with meta robots set to index, the other set to noindex. Each of these pages linked to a destination page, the links having the same anchor text (which was a unique, or at least unusual, portmanteau word). The anchor text did not appear anywhere else on any pages.

The expectation was that both destination pages would be indexed. I also expected both pages to appear in a search for the anchor text word, but I wasn’t absolutely sure about this. Then, if both destination pages did indeed appear for that word, I was interested to see which ranked better.

It took longer than I expected for all the pages to be indexed, but both destination pages made it in there eventually. The linking page that was set to noindex, of course, is not there.

The indexed linking page is the first result from the site for the anchor text term. This page contains the term, and is further up the site hierarchy. The second result from the site is the second destination page (ie the one linked from the noindex linking page). Google’s cache of that page contains the familiar phrase: “these terms only appear in links point to this page”, followed by the anchor text. 

The first destination page does not appear in the results. It may do in future, and if it does I will report on its relative performance. But the page linked from the noindex parent was first to show…

This result demonstrates that pages set to noindex are passing link anchor text. This should not be too much of a surprise. From the initial result, it might appear that it is doing so more efficiently that an indexed page. I think that conclusion would not be correct. However, it might be reasonable to assume that it is passing anchor text at least as well as an indexed page.

Some further questions arise:

  • does Google consider the non-linked textual content of a non-indexed page when determining the relevance of the links from that page?
  • Indeed, does Google treat “noindex” pages exactly the same as other pages in its index – assessing the content, placing them in the link graph etc – and the only difference is that pages are not returned in SERPs?
  • What difference would there be if the page rather than locally set to noindex had been excluded using robots.txt?
  • Is it a given that the page containing the anchor text link would rank higher for the phrase than the page linked to, if the page linked to did not itself contain the word? Or in other words, does textual content outrank anchor text?

I don’t think that last one can be true, and I feel another experiment coming on…

Ranking prediction: result

August 17th, 2008

As George Costanza used to say, “I was wrong”. This week I’m wrong about the effect of the so-called powerful external link that I mentioned before

It turned up in Webmaster Tools as a credited link to the site. Did it make any difference as to the order of the blog home and site home in the Google rankings for the “search experiments” query? As I predicted? No, it did not. The blog home still sits there as the first result, with the home page indented.

I guess that with all the cross-linking, those two pages may well have similar rank, and the blog home page is more relevant in terms of its content.

Effects of taggregation, plus status updates

August 15th, 2008

I am a little surprised to find that the blog home page was briefly #2 (now #3) in Google for the phrase “search experiments”, and that the site home page is #2 in Yahoo (in each case, the UK varieties). Despite this apparent “success” (I don’t think that the term has driven any search visitors to the site), there remain pages of the site resolutely unindexed.

Google

The preference that Google is showing for the blog home page is also interesting, and it is worth looking into why this might be, particularly because the links that I have created are all to the website home page. Although all the pages on the site link to the blog home, all the pages/posts on the blog link to the site home. 

So what is going on with Google here? A link: operator search returns no results, but Webmaster tools credits the site overall with 39 external links. Eight of these are to the home page, the rest to blog pages. The eight, which I set up, are from a couple of other blogs, one of which is totally weak and the other fairly weak.

The links to blog pages are mostly from Technorati, and all Technorati links are from pages aggregating all blogs with particular tags. The other links look as if they are doing something similar, probably with material taken or scraped from Technorati.

There’s good cross-linking between the blog and the other site pages: all links on blog pages to the main site home page use the phrase; conversely, all links on the non-blog pages link to the blog including the phrase. 

So, crosslinking should pretty much cancel itself out in relation to relative ranking. Which leads to an interesting tentative hypothesis: that simply blogging and using tags can garner external links – from aggregator pages – that are as powerful as hand-edited links from existing sites.

I do have one reasonable powerful incoming link set up (from the home page of a five-year old site with thousands of organic links), but this is not yet showing up as an external link in Webmaster tools. (This link is to the home page, not the blog.)

OK, it could of course be passing PR without showing up in Webmaster tools. I shall keep an eye out to see whether the relative ranking changes, and when the link shows up in Webmaster tools.

Yahoo

In Yahoo, it’s the home page that is showing up in the rankings. The blog home page is nowhere to be seen in the rankings; indeed, Site Explorer doesn’t recognise the page among the six that it currently lists. 

However, Site Explorer is giving credit for the one relatively powerful link to the site.

Observations and predictions

1) The blog home page being “ahead” of the home page in Google rankings seems to suggest that the links garnered by tag aggregation – I am disappointed but not wholly surprised to discover that the word “taggregation” has already been coined – may have a significant role to play in getting content indexed and ranked. I will not put it more strongly than that at present. It may be worth experimenting with a new blog, unlinked elsewhere, to test this hypothesis – by watching how it performs up to the point that someone manually links to it.

2) Having a top 3 result for a plausible if specialised phrase does not necessarily generate traffic.

3) Google is more interested in blog content than Yahoo (?)

Prediction: when Webmaster Tools shows the strong site in the external links, the home page for the site will outperform the blog home page in Google. 

Thinking about it, the other possible reason that the blog home page may be outperforming the home page is content – there’s typically a lot more content on the blog page and (obviously enough) the phrase “search experiments” gets mentioned all the time on it.