One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).
So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?
One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.
You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.
To run it you just need to:
You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).
And when double clicking any of those results you get the details as reported by the W3 Validation Service:
The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.
The code looks like this:
I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.
Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.
Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.
In my spare time I’ve been thinking about new ideas for the SEO Toolkit, and it occurred to me that rather than continuing trying to figure out more reports and better diagnostics against some random fake sites, that it could be interesting to ask openly for anyone that is wanting a free SEO analysis report of your site and test drive some of it against real sites.
So if you want in, just post me your URL in the comments of this blog (make sure you are reading this blog from a URL inside http://blogs.msdn.com/carlosag/ , otherwise you might be posting comments in some syndicating site.), I will only allow the first few sites (if successful I will start another batch in the future) and I will be doing one by one in the following days. Make sure to include a way to contact you whether using the MSDN user infrastructure or include an email so that I can contact you with the results.
Alternatively I will take also URLs using Twitter at http://twitter.com/CarlosAguilarM so hurry up and let me know if you want me to look at your site.