I have been visiting the UK for the last few weeks. It was a bit of a “whirl wind” tour “5 cities in 14 days”. There was so much to see that 14 days per city wouldn’t have been enough.
Among the items waiting for me when I got back was a request to do a SEO audit on a website that had recently been updated. One of the problems was so common that I thought I would pass it on.
I first checked the robots.txt file. This is the file that tells the robots (search engines) which pages pages you want them to exclude from their index. Google, and the other major search engines, take the robots.txt as gospel and obeys all the entries. Getting the robots.txt wrong is the quickest way to get your site ignored by the search engines.
Unfortunately, it is a sad fact that many, dare I say most, people claiming to be web designers either don’t know about or forget to check the robots.txt. In this case the txt was blocking the search engines from a directory that no longer existed – not a problem. And, pointing to the sitemap.xml file.
The sitemap.xml, or sitemap, is a complete listing of all the pages that you want the search engines to index. And, while the search engines have a method of submitting sitemaps, they can also ‘discover’ the sitemap if it is listed in the robots.txt.
IMHO, all websites should have a sitemap, with ONE BIG PRIVISO – IT MUST BE KEPT UPTO DATE!.
One of the key pieces of information provided by the sitemap is the last time a page was modified. The search engines use this as a short cut. If the page hasn’t been modified since the last time they indexed the site – there is no reason to check that page for new content. It saves the search engines time, and reduces your bandwidth usage. But, this works only – ONLY if the sitemap is upto date.
Sadly, it hadn’t been updated since 2009.
Fortunately, the major search engines were ignoring the “last modified” tag and viewing the new pages. They new content apperared to be indexed as the new description meta tags were appearing in the SERP. However, this is sub-optimal.
A current sitemap is the best way to ensure all your pages are at least being indexed. Don’t forget to update it when you do your next site update or add new content.