Site Diagnosis – are all your pages indexed properly?
I have just released a new feature to Link Diagnosis Firefox Extension that will allow easy diagnostics of the indexed pages on your website.
Couple of weeks ago I was facing a tedious task of finding out which pages out of 100k on the site are indexed and which are not. I knew that some of them could have been marked as duplicate content or that Google simply didn’t indexed them because of the size of the website.
First I installed Google Webmaster Tools hoping that Google will tell me that. Unfortunately, the Indexed Pages tab just points me to use site: command.
I don’t trust site: command. Especially, the count of number of pages is very inaccurate. I know I have 100k pages and Google tells me I have 150k pages indexed.
Also, there is no easy way to see more than 1000 pages (you can play with inurl: commands but it takes ages and you can get banned).
Because of these problems I decided to code a tool which would automate it – Site Diagnosis.
The internal algorithm of the tool works as follows:
1. Go through every URL in XML Sitemap file and do a simple check inurl:http://www.samplesite.com/dir/url1
2. For every url that does not appear on inurl: command there is still a chance that page is indexed but does not appear with inurl
3. For every url in XML Sitemaps I get a title and perform this check site:http://www.samplesite.com sample title
4. If the page does not rank in top 10 for its title within the site then probably something is wrong.
This check is suprisingly accurate and most of the pages that don’t survive this check have some problems like duplicate content, missing titles, missing content or not enough content. These troubled pages usualy don’t appear in the search results if you search for any text on the page – not even when you enclose sentences in quotes.
Obviously, the goal of search engine optimization is to fix these pages so Site Diagnosis will hopefully be essential in identifying them.


April 2nd, 2008 at 9:21 pm
Hi, is there a way to private label your tool?
April 3rd, 2008 at 11:57 am
Billy> I am not planning to do any white-labels sorry. You might try Blogstorm hosted link analysis tool.
April 9th, 2008 at 9:48 pm
Very nice tool. Thanks for sharing.
May 6th, 2008 at 8:42 am
Thank you for that awesome tool. I´m a frequent user and I have to say that all the features are very usefull and functional.
May 19th, 2008 at 9:37 am
It’s really a nice tool: useful and simple!
May 28th, 2008 at 8:56 am
Great tool! Thanks a lot! Very useful!
August 25th, 2008 at 4:25 pm
Great Tool!
October 5th, 2008 at 5:17 am
Google…
Google…
October 23rd, 2008 at 11:08 am
the best thing since sliced bread.
- Great stuff
October 26th, 2008 at 4:52 pm
Very nice tool. Thanks for aplication.
November 23rd, 2008 at 3:31 pm
VERY USEFUL INSTRUMENT
March 28th, 2009 at 6:46 am
I love your tool , I have read the there is a not-free version or will be it?
June 3rd, 2009 at 3:14 am
Great tool
June 8th, 2009 at 10:38 am
Hej – a great tool! Really love it. And very interesting insights you shre in your articles – so many thanks to you!
August 10th, 2009 at 8:36 am
Thanks. FF ext is especially cool.
September 24th, 2009 at 6:07 am
Great Tool. Very helpful!!
October 19th, 2009 at 2:46 pm
Now that’s a useful tool. That’s the problem I always have because of the limitations the search engines give. Nice.
November 18th, 2009 at 12:13 pm
This is actually a feature I had been hoping someone would come up with. As far as I’m concerned, if pages aren’t indexed in Google they are potentially losing me money. So being able to see what pages aren’t indexed I can simply get a few links to them. Thanks.