Site Diagnosis – are all your pages indexed properly?

I have just released a new feature to Link Diagnosis Firefox Extension that will allow easy diagnostics of the indexed pages on your website.

Couple of weeks ago I was facing a tedious task of finding out which pages out of 100k on the site are indexed and which are not. I knew that some of them could have been marked as duplicate content or that Google simply didn’t indexed them because of the size of the website.

First I installed Google Webmaster Tools hoping that Google will tell me that. Unfortunately, the Indexed Pages tab just points me to use site: command.

I don’t trust site: command. Especially, the count of number of pages is very inaccurate. I know I have 100k pages and Google tells me I have 150k pages indexed.

Also, there is no easy way to see more than 1000 pages (you can play with inurl: commands but it takes ages and you can get banned).

Because of these problems I decided to code a tool which would automate it – Site Diagnosis.

The internal algorithm of the tool works as follows:

1. Go through every URL in XML Sitemap file and do a simple check inurl:http://www.samplesite.com/dir/url1
2. For every url that does not appear on inurl: command there is still a chance that page is indexed but does not appear with inurl
3. For every url in XML Sitemaps I get a title and perform this check site:http://www.samplesite.com sample title
4. If the page does not rank in top 10 for its title within the site then probably something is wrong.

This check is suprisingly accurate and most of the pages that don’t survive this check have some problems like duplicate content, missing titles, missing content or not enough content. These troubled pages usualy don’t appear in the search results if you search for any text on the page – not even when you enclose sentences in quotes.

Obviously, the goal of search engine optimization is to fix these pages so Site Diagnosis will hopefully be essential in identifying them.

18 Responses to “Site Diagnosis – are all your pages indexed properly?”

  1. Billy Says:

    Hi, is there a way to private label your tool?

  2. Janusz Says:

    Billy> I am not planning to do any white-labels sorry. You might try Blogstorm hosted link analysis tool.

  3. Share your link Says:

    Very nice tool. Thanks for sharing.

  4. pare Says:

    Thank you for that awesome tool. I´m a frequent user and I have to say that all the features are very usefull and functional.

  5. Alex Says:

    It’s really a nice tool: useful and simple!

  6. Jeroen - J8 SEO Says:

    Great tool! Thanks a lot! Very useful!

  7. Randy Moore Says:

    Great Tool!

  8. Google Says:

    Google…

    Google…

  9. Typical English Says:

    the best thing since sliced bread.
    - Great stuff

  10. posicionamiento web Says:

    Very nice tool. Thanks for aplication.

  11. Igor Says:

    VERY USEFUL INSTRUMENT

  12. Hotel Rimini Says:

    I love your tool , I have read the there is a not-free version or will be it?

  13. Hewolka Says:

    Hej – a great tool! Really love it. And very interesting insights you shre in your articles – so many thanks to you!

  14. 4everdesign Says:

    Thanks. FF ext is especially cool.

  15. guitar pedal reviews Says:

    Now that’s a useful tool. That’s the problem I always have because of the limitations the search engines give. Nice.

  16. April Says:

    This is actually a feature I had been hoping someone would come up with. As far as I’m concerned, if pages aren’t indexed in Google they are potentially losing me money. So being able to see what pages aren’t indexed I can simply get a few links to them. Thanks.

Leave a Reply