One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).
So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?
One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.
You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.
To run it you just need to:
You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).
And when double clicking any of those results you get the details as reported by the W3 Validation Service:
The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.
The code looks like this:
I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.
Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.
Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.
A couple of months ago I blogged about the release of the v1.0.1 of the IIS Search Engine Optimization Toolkit. In March we released the localized versions of the SEO Toolkit so now it is available in 10 languages: English, Japanese, French, Russian, Korean, German, Spanish, Chinese Simplified, Italian and Chinese Traditional.
Here are all the direct links to download it.
Here is a screenshot of how the SEO Toolkit running in Spanish.
If you want to read the download files in the Microsoft Download Center you can click the links below:
To learn more about the SEO Toolkit you can visit:
http://blogs.msdn.com/carlosag/archive/tags/SEO/default.aspx
http://www.iis.net/expand/SEOToolkit
And for any help or provide us feedback you can do that in the IIS.NET SEO Forum.
Today there was a question in the IIS.net Forums asking how to expose two different Internet sites from another site making them look like if they were subdirectories in the main site.
So for example the goal was to have a site: www.site.com expose a www.site.com/company1 and a www.site.com/company2 and have the content from “www.company1.com” served for the first one and “www.company2.com” served in the second one. Furthermore we would like to have the responses cached in the server for performance reasons. The following image shows a simple diagram of this:
This sounds easy since its just about routing or proxying every single request to the correct servers, right? Wrong!!! If it only it was that easy. Turns out the most challenging thing is that in this case we are modifying the structure of the underlying URLs and the original layout in the servers which makes relative paths break and of course images, Stylesheets (css), javascripts and other resources are not shown correctly.
To try to clarify this, imagine that a user requests using his browser the page at http://www.site.com/company1/default.aspx, and so based on the specification above the request is proxied/routed to http://www.company1.com/default.aspx on the server-side. So far so good, however, imagine that the markup returned by this HTML turns out to have an image tag like “<img src=/some-image.png />”, well the problem is that now the browser will resolve that relative path using the base path on the original request he made which was http://www.site.com/company1/default.aspx resulting in a request for the image at http://www.site.com/some-image.png instead of the right “company1” folder that would be http://www.site.com/company1/some-image.png .
Do you see it? Basically the problem is that any relative path or for that matter absolute paths as well need to be translated to the new URL structure imposed by the original goal.
So how do we do it then?
URL Rewrite 2.0 includes the ability to rewrite the content of a response as it is getting served back to the client which will allow us to rewrite those links without having to touch the actual application.
Software Required:
Steps
The first rule is an inbound rewrite rule that basically captures all the requests to the root folder /company1/*, so if using Default Web Site, anything going to http://localhost/company1/* will be matched by this rule and it will rewrite it to www.company1.com respecting the HTTP vs HTTPS traffic.
One thing to highlight which is what took me a bit of time is the “serverVariables” entry in that rule that basically is overwriting the Accept-Encoding header, the reason I do this is because if you do not remove that header then the response will likely be compressed (Gzip or deflate) and Output Rewriting is not supported on that case, and you will end up with an error message like:
HTTP Error 500.52 - URL Rewrite Module Error. Outbound rewrite rules cannot be applied when the content of the HTTP response is encoded ("gzip").
Also note that to be able to use this feature for security reasons you need to explicitly enable this by allowing the server variable. See enabling server variables here.
The last two rules just rewrite the links and scripts and other resources so that the URLs are translated to the right structure. The first one rewrites absolute paths, and the last one rewrites the relative paths. Note that if you use relative paths using “..” this will not work, but you can easily fix the rule above, I was too lazy to do that and since I never use those when I create a site it works for me :)
A huge added value of using ARR is that now we can with a couple of clicks enable disk caching so that the requests are cached locally in the www.site.com, so that not every single request ends up paying the price to go to the backend servers.
As easy as that now you will see caching working and your site will act as a container of other servers in the internet. Pretty cool hah! :)
So in this post we saw how with literally few lines of XML, URL Rewrite and ARR we were able to enable a proxy/routing scenario with the ability to rewrite links and furthermore with caching support.
A few weeks ago my team released the version 2.0 of the URL Rewrite for IIS. URL Rewrite is probably the most powerful Rewrite engine for Web Applications. It gives you many features including Inbound Rewriting (ie. Rewrite the URL, Redirect to another URL, Abort Requests, use of Maps, and more), and in Version 2.0 it also includes Outbound Rewriting so that you can rewrite URLs or any markup as the content is being sent back even if its generated using PHP, ASP.NET or any other technology.
It also includes a very powerful User Interface that allows you to test your regular expressions and even better it includes a set of templates for common types of Rules. Some of those rules are incredibly valuable for SEO (Search Engine Optimization) purposes. The SEO rules are:
For more information on the SEO Templates look at: http://learn.iis.net/page.aspx/806/seo-rule-templates/
What is really cool is that you can use the SEO Toolkit to run it against your application and you probably will get some violations around lower-case, or canonical domains, etc. And after seeing those you can use URL Rewrite 2.0 to fix them with one click.
I have personally used it in my Web site, try the following three URLs and all of them will be redirected to the canonical form (http://www.carlosag.net/Tools/CodeTranslator/) and you will see URL Rewrite in action:
Note that at the end those templates just translate to web.config settings that become part of your application that can be XCOPY with it. This works with ASP.NET, PHP, or any other server technology including static files. Below is the output of the Canonical Host Name rule which I use on my Web site’s web.config.
There are many more features that I could talk, but for now this was just a quick SEO related post.
The other day I was asked if I knew about a tool that would allow users to easily analyze the IIS Log Files, to process and look for specific data that could easily be automated. My recommendation was that if they were comfortable with using a SQL-like language that they should use Log Parser. Log Parser is a very powerful tool that provides a generic SQL-like language on top of many types of data like IIS Logs, Event Viewer entries, XML files, CSV files, File System and others; and it allows you to export the result of the queries to many output formats such as CSV (Comma-Separated Values, etc), XML, SQL Server, Charts and others; and it works well with IIS 5, 6, 7 and 7.5.
To use it you just need to install it and use the LogParser.exe that is found in its installation directory (on my x64 machine it is located at: C:\Program Files (x86)\Log Parser 2.2).
I also thought on sharing some of my favorite queries. To run them, just execute LogParser.exe and make sure to specify that the input is an IIS Log file (-i:W3C) and for ease of use in this case we will export to a CSV file that can be then opened in Excel (-o:CSV) for further analysis:
A final note: any time you deal with Date and Time, remember to use the TO_LOCALTIME function to convert the log times to your local time, otherwise you will find it very confusing when your entries seem to be reported incorrectly.
If you need any help you can always visit the Log Parser Forums to find more information or ask specific questions.
Any other useful queries I missed?
Are you an developer/owner/publisher/etc of a site that uses HTTPS (SSL) for secure access? If you are, please continue to read.
Have you ever visited a Web site that is secured using SSL (Secure Sockets Layer) just to get an ugly Security Warning message like:
Do you want to view only the webpage content that was delivered securely?
This webpage contains content that will not be delivered using a secure HTTPS connection, which could compromise the security of the entire webpage.
How frustrating is this for you? Do you think that end-users know what is the right answer to the question above? Honestly, I think it actually even feels like the Yes/No buttons and the phrasing of the question would cause me to click the wrong option.
What this warning is basically trying to tell the user is that even though he/she navigated to a page that you thought was secured by using SSL, the page is consuming resources that are coming from an unsecured location, this could be scripts, style-sheets or other types of objects that could potentially pose a security risk since they could be tampered on the way or come from different locations.
As a site owner/developer/publisher/etc should always make sure that you are not going to expose your customers to such a bad experience, leaving them with an answer that they can’t possibly choose right. For one if they ‘choose Yes’ they will get an incomplete experience being broken images, broken scripts or something worse; otherwise they can ‘choose No’ which is even worse since that means you are actually teaching them to ignore this warnings which could indeed in some cases be real signs of security issues.
Bottom-line it should be imperative that any issue like this should be treated as a bug and fixed in the application if possible.
But the big question is how do you find these issues? Well the answer is very simple yet extremely time consuming, just navigate to every single page of your site using SSL and as you do that examine every single resource in the page (styles, objects, scripts, etc) and see if the URL is pointing to a non-HTTPS location.
The good news is that using the SEO Toolkit is extremely simple to find these issues.
Using the IIS SEO Toolkit and it powerful Query Engine you can easily detect conditions on your site that otherwise would take an incredible amount of time and that would be prohibitively expensive to do constantly.
Last week we released a refresh for the IIS Search Engine Optimization (SEO) Toolkit v1.0. This version is a minor update that includes fixes for all the important bugs reported in the IIS.NET SEO Forum.
Some of the fixes included in this version are:
This release is compatible with v1.0 RTM and it will upgrade if already installed. So go ahead and install the new version using Web Platform Installed by clicking: http://go.microsoft.com/?linkid=9695987
Learn more about it at: http://www.iis.net/expand/SEOToolkit
In this blog we are going to write an example on how to extend the SEO Toolkit functionality, so for that we are going to pretend our company has a large Web site that includes several images, and now we are interested in making sure all of them comply to a certain standard, lets say all of them should be smaller than 1024x768 pixels and that the quality of the images is no less than 16 bits per pixel. Additionally we would also like to be able to make custom queries that can later allow us to further analyze the contents of the images and filter based on directories and more.
For this we will extend the SEO Toolkit crawling process to perform the additional processing for images, we will be adding the following new capabilities:
A crawler module is a class that extends the crawling process in Site Analysis to provide custom functionality while processing each URL. By deriving from this class you can easily raise your own set of violations or add your own data and links to any URL.
It includes three main methods:
Create a Class Library in Visual Studio and add the code shown below.
As you can see in the BeginAnalysis the module registers three new properties with the Report using the Crawler property. This is only required if you want to provide either a custom text or use it for different type other than a string. Note that current version only allows primitive types like Integer, Float, DateTime, etc.
During the Process method it first makes sure that it only runs for known content types, then it performs any validations raising a set of custom violations that are defined in the Violations static helper class. Note that we load the content from the Response Stream, which is the property that contains the received from the server. Note that if you were analyzing text the property Response would contain the content (this is based on Content Type, so HTML, XML, CSS, etc, will be kept in this String property).
When running inside IIS Manager, crawler modules need to be registered as a standard UI module first and then inside their initialization they need to be registered using the IExtensibilityManager interface. In this case to keep the code as simple as possible everything is added in a single file. So add a new file called "RegistrationCode.cs" and include the contents below:
This code defines a standard UI IIS Manager module and in its client-side initialize method it uses the IExtensibilityManager interface to register the new instance of the Image extension. This will make it visible to the Site Analysis feature.
To test it we need to add the UI module to Administration.config, that also means that the assembly needs to be registered in the GAC.
To Strongly name the assembly
In Visual Studio, you can do this easily by using the menu "Project->Properties", and select the "Signing" tab, check the "Sign the assembly", and choose a file, if you don't have one you can easily just choose New and specify a name.
After this you can compile and now should be able to add it to the GAC.
To GAC it
If you have the SDK's you should be able to call it like in my case:
"\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /if SampleCrawlerModule.dll
(Note, you could also just open Windows Explorer, navigate to c:\Windows\assembly and drag & drop your file in there, that will GAC it automatically).
Finally to see the right name that should be use in Administration.config run the following command:
"\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /l SampleCrawlerModule
In my case it displays:
SampleCrawlerModule, Version=1.0.0.0, Culture=neutral, PublicKeyToken=6f4d9863e5b22f10, …
Finally register it in Administration.config
Open Administration.config in Notepad using an elevated instance, find the </moduleProviders> and add a string like the one below but replacing the right values for Version and PublicKeyToken:
After registration you now should be able to launch IIS Manager and navigate to Search Engine Optimization. Start a new Analysis to your Web site. Once completed if there are any violations you will see them correctly in the Violations Summary or any other report. For example see below all the violations in the "Images" category.
Since we also extended the metadata by including the new fields (Image Width, Image Height, and Image Pixel Format) now you can use them with the Query infrastructure to easily create a report of all the images:
And since they are standard fields, they can be used in Filters, Groups, and any other functionality, including exporting data. So for example the following query can be opened in the Site Analysis feature and will display an average of the width and height of images summarized by type of image:
And of course violation details are shown as specified, including Recommendation, Description, etc:
As you can see extending the SEO Toolkit using a Crawler Module allows you to provide additional information, whether Metadata, Violations or Links to any document being processed. This can be used to add support for content types not supported out-of-the box such as PDF, Office Documents or anything else that you need. It also can be used to extend the metadata by writing custom code to wire data from other system into the report giving you the ability to exploit this data using the Query capabilities of Site Analysis.
The IIS SEO Toolkit includes a lot of functionality built-in such as built-in violation rules, processing of different content types (like HTML, CSS, RSS, etc) and more, however it might not do all the things that you would need it to do, for example, it might not process a set of documents that you use, or it might not gather all the information that you are interested in while processing a document. The good news is that it includes enough extensibility to let you build on top of its rich capabilities and provide additional ones easily using .NET.
There are three main extensibility points in this first release, including:
This is the first of a series of extensibility blog entries for the IIS SEO Toolkit where I will cover all of the extensibility points mentioned above.
Two weeks ago I presented at DevConnections the talk "AMS10: Developing and Deploying for the Windows Web App Gallery", here are the slides.
Download the Web Application Gallery Talk slides here.
A few final links:
Microsoft Web Platform: http://www.microsoft.com/web/
Download Web PI: http://www.microsoft.com/web/downloads/platform.aspx
Submit your Applications at: http://www.microsoft.com/web/gallery/developer.aspx