One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).

So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?

One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.

Download

You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.

How to install it

To run it you just need to:

  1. Unzip the contents in a folder.
  2. Install the SEOW3Validator.dll assembly in the GAC:
    1. Open a Windows Explorer window and navigate to c:\Windows\assembly
    2. Drag and Drop the SEOW3Validator.dll to the c:\Windows\assembly explorer window.
    3. Alternatively you can just run gacutil.exe /i SEOW3Validator.dll, usually located at C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin or v7A.
    4. If you have problems with this, you could try just copying the assembly to the GAC (copy SEOW3Validator.dll c:\Windows\assembly\GAC_MSIL\SEOW3Validator\1.0.0.0__995ee9b8fa017847\SEOW3Validator.dll)
  3. Register the moduleProvider in Administration.config: In an elevated prompt open C:\Windows\System32\Inetsrv\config\Administration.config and add the following line right inside the <moduleProviders> right before closing the </moduleProviders>:
  4.   <add name="SEOW3Validator" 
           type
    ="SEOW3Validator.SEOW3ValidatorModuleProvider, SEOW3Validator, Version=1.0.0.0, Culture=neutral, PublicKeyToken=995ee9b8fa017847" />

You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).

SampleValidatorResults

And when double clicking any of those results you get the details as reported by the W3 Validation Service:

SampleValidatorDetails

The Code

The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.

The code looks like this:

    W3Validator validator = new W3Validator();
   
W3ValidatorResults results = validator.Validate(context.UrlInfo.FileName, 
       
context.UrlInfo.ContentTypeNormalized, 
       
context.UrlInfo.Response);

   
foreach (W3ValidatorWarning warning in results.Warnings) {
       
context.UrlInfo.AddViolation(CreateWarning(warning));
   
}

   
foreach (W3ValidatorError error in results.Errors) {
       
context.UrlInfo.AddViolation(CreateError(error));
   
}

 

 

 

I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.

Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.

Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.