In this blog we are going to write an example on how to extend the SEO Toolkit functionality, so for that we are going to pretend our company has a large Web site that includes several images, and now we are interested in making sure all of them comply to a certain standard, lets say all of them should be smaller than 1024x768 pixels and that the quality of the images is no less than 16 bits per pixel. Additionally we would also like to be able to make custom queries that can later allow us to further analyze the contents of the images and filter based on directories and more.
For this we will extend the SEO Toolkit crawling process to perform the additional processing for images, we will be adding the following new capabilities:
A crawler module is a class that extends the crawling process in Site Analysis to provide custom functionality while processing each URL. By deriving from this class you can easily raise your own set of violations or add your own data and links to any URL.
It includes three main methods:
Create a Class Library in Visual Studio and add the code shown below.
As you can see in the BeginAnalysis the module registers three new properties with the Report using the Crawler property. This is only required if you want to provide either a custom text or use it for different type other than a string. Note that current version only allows primitive types like Integer, Float, DateTime, etc.
During the Process method it first makes sure that it only runs for known content types, then it performs any validations raising a set of custom violations that are defined in the Violations static helper class. Note that we load the content from the Response Stream, which is the property that contains the received from the server. Note that if you were analyzing text the property Response would contain the content (this is based on Content Type, so HTML, XML, CSS, etc, will be kept in this String property).
When running inside IIS Manager, crawler modules need to be registered as a standard UI module first and then inside their initialization they need to be registered using the IExtensibilityManager interface. In this case to keep the code as simple as possible everything is added in a single file. So add a new file called "RegistrationCode.cs" and include the contents below:
This code defines a standard UI IIS Manager module and in its client-side initialize method it uses the IExtensibilityManager interface to register the new instance of the Image extension. This will make it visible to the Site Analysis feature.
To test it we need to add the UI module to Administration.config, that also means that the assembly needs to be registered in the GAC.
To Strongly name the assembly
In Visual Studio, you can do this easily by using the menu "Project->Properties", and select the "Signing" tab, check the "Sign the assembly", and choose a file, if you don't have one you can easily just choose New and specify a name.
After this you can compile and now should be able to add it to the GAC.
To GAC it
If you have the SDK's you should be able to call it like in my case:
"\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /if SampleCrawlerModule.dll
(Note, you could also just open Windows Explorer, navigate to c:\Windows\assembly and drag & drop your file in there, that will GAC it automatically).
Finally to see the right name that should be use in Administration.config run the following command:
"\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /l SampleCrawlerModule
In my case it displays:
SampleCrawlerModule, Version=1.0.0.0, Culture=neutral, PublicKeyToken=6f4d9863e5b22f10, …
Finally register it in Administration.config
Open Administration.config in Notepad using an elevated instance, find the </moduleProviders> and add a string like the one below but replacing the right values for Version and PublicKeyToken:
After registration you now should be able to launch IIS Manager and navigate to Search Engine Optimization. Start a new Analysis to your Web site. Once completed if there are any violations you will see them correctly in the Violations Summary or any other report. For example see below all the violations in the "Images" category.
Since we also extended the metadata by including the new fields (Image Width, Image Height, and Image Pixel Format) now you can use them with the Query infrastructure to easily create a report of all the images:
And since they are standard fields, they can be used in Filters, Groups, and any other functionality, including exporting data. So for example the following query can be opened in the Site Analysis feature and will display an average of the width and height of images summarized by type of image:
And of course violation details are shown as specified, including Recommendation, Description, etc:
As you can see extending the SEO Toolkit using a Crawler Module allows you to provide additional information, whether Metadata, Violations or Links to any document being processed. This can be used to add support for content types not supported out-of-the box such as PDF, Office Documents or anything else that you need. It also can be used to extend the metadata by writing custom code to wire data from other system into the report giving you the ability to exploit this data using the Query capabilities of Site Analysis.