Using Google Cache to Recover Lost Posts
5 Mar
I was checking the Google Webmaster Tools statistics for this blog today and noticed that the Google web crawler picked up a few broken links. That was weird, since I’m using WordPress and I haven’t messed around in a way that could cause this kind of behavior, so everything should be neat and tidy.
Further analysis showed that there where two pages that the crawler couldn’t get to. 404 errors were being thrown instead.
One of them is a category page that’s being linked from the homepage. It’s being linked using a query string with the category id in it, which is something that’s no longer being used. The categories link now have the category name in it. Apparently this error was detected on the 17th of January (about 1 month and a half ago), and it’s weird that the homepage hasn’t been crawled since. I’ll have to look into that later.
The other problematic page (and the reason for this post) had 6 links to it, and when I checked the post that had those links I realised that this was a case of a post that, to put it simply, had vanished! How? I have no idea. But I do remember writing it, posting it, and linking to it. So it had to be somewhere. It wasn’t in the list of all posts. It wasn’t in Windows Live Writer. I was just about to check the database backups when I decided to try my luck with Google. I knew they keep a cache of the indexed pages, so I could get lucky and find it there. All it took was a search for a term I knew was in the title of the post:
I instantly recognised what I was looking for from the tidbit on the first result. Clicking the “Cached” link I was taken to the page as it was on the 25th of January, 5 days after the post was written, and there I found the post exactly like I remembered it. A few minutes for copying and pasting to a new post, with the same name and URL, and it’s now live again. When the crawlers come back, the crawling errors should go away.
I still don’t know how the post vanished, though…

Recent Comments