Friday, November 03, 2006

Dancing with the Data Centers

In order to search the massive amounts of information for a query, search engines use several different data centers to give you your results. If you regularly check a web site’s position in the search engine rankings, you’ve probably noticed that your rankings for a specific keyword phrase may vary, even if your searches are within seconds of one another.

This inconsistency is due to the fact that each data center may yield its own search results. While the information you receive for a given search is ideally the same from data center to center, there are often slight, and occasionally drastic, discrepancies.

For someone trying to optimize their site for higher rankings or specific slots in the search engine results, this can be troubling, especially when one runs across those rare flukes when a site ranking number two in the morning is nowhere to be found in the evening.

When a search engine is updating its indices, it may take days for the change to be uniform. Two searches for the same term can yield two different results: one from a data center that has yet to be updated, and one from a data center that has been. The Google Watch Tool is a convenient way to see how a page ranks for a specific search in the different data centers, viewing up to a hundred results.

While I have yet to a see a similar tool for Yahoo!, MSN or other search engines, you can view the results for the different data centers by pinging the engine repeatedly for the different IPs. A list of Yahoo!’s IPs is available at http://www.vaughns-1-pagers.com/internet/yahoo-data-centers.htm.

Yahoo's Slurp Increases Its IQ

Having problems with Yahoo indexing your site? Yahoo just increased its crawlers vocabulary. A common symbol in Robots.txt files is "*" and Yahoo's crawler had trouble recognizing this symbol.

Yahoo added "*" and "$" to the list of recognizable symbols for their crawler to read according to their recent post in the Yahoo Blog. I am very suprized this took so long. According to the most recent Robots txt procedures from robotstxt.org these are the proper entries for your robots file:

To exclude all robots from the entire server
User-agent: *
Disallow: /
To allow all robots complete access
User-agent: *
Disallow:

So if Yahoo crawler was having trouble with the "*" symbol and that symbol is a best practice, by the crawler now accepting that symbol, you may now see more of your pages included in the Yahoo index shortly.