Last week we released a refresh for the IIS Search Engine Optimization (SEO) Toolkit v1.0. This version is a minor update that includes fixes for all the important bugs reported in the IIS.NET SEO Forum.
Some of the fixes included in this version are:
This release is compatible with v1.0 RTM and it will upgrade if already installed. So go ahead and install the new version using Web Platform Installed by clicking: http://go.microsoft.com/?linkid=9695987
Learn more about it at: http://www.iis.net/expand/SEOToolkit
A lot of sites today have the ability for users to sign in to show them some sort of personalized content, whether its a forum, a news reader, or some e-commerce application. To simplify their users life they usually want to give them the ability to log on from any page of the Site they are currently looking at. Similarly, in an effort to keep a simple navigation for users Web Sites usually generate dynamic links to have a way to go back to the page where they were before visiting the login page, something like: <a href="/login?returnUrl=/currentUrl">Sign in</a>.
If your site has a login page you should definitely consider adding it to the Robots Exclusion list since that is a good example of the things you do not want a search engine crawler to spend their time on. Remember you have a limited amount of time and you really want them to focus on what is important in your site.
Out of curiosity I searched for login.php and login.aspx and found over 14 million login pages… that is a lot of useless content in a search engine.
Another big reason is because having this kind of URL's that vary depending on each page means there will be hundreds of variations that crawlers will need to follow, like /login?returnUrl=page1.htm, /login?returnUrl=page2.htm, etc, so it basically means you just increased the work for the crawler by two-fold. And even worst, in some cases if you are not careful you can easily cause an infinite loop for them when you add the same "login-link" in the actual login page since you get /login?returnUrl=login as the link and then when you click that you get /login?returnUrl=login?returnUrl=login... and so on with an ever changing URL for each page on your site. Note that this is not hypothetical this is actually a real example from a few famous Web sites (which I will not disclose). Of course crawlers will not infinitely crawl your Web site and they are not that silly and will stop after looking at the same resource /login for a few hundred times, but this means you are just reducing the time of them looking at what really matters to your users.
If you use the IIS SEO Toolkit it will detect the condition when the same resource (like login.aspx) is being used too many times (and only varying the Query String) and will give you a violation error like: Resource is used too many times.
There are a few fixes, but by far the best thing to do is just add the login page to the Robots Exclusion protocol.
To summarize always add the login page to the robots exclusion protocol file, otherwise you will end up:
One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).
So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?
One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.
You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.
To run it you just need to:
You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).
And when double clicking any of those results you get the details as reported by the W3 Validation Service:
The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.
The code looks like this:
I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.
Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.
Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.
Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?
Well, the answer is not necessarily.
I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:
Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.
Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.
Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.
In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":
In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.
Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.
A couple of years ago a friend of mine introduced me to a game called Sudoku, and immediately I loved it. As any good game its rules are very simple, basically you have to lay out the numbers from 1 to 9 horizontally in a row without repeating them, while at the same time you have to layout the same 1 to 9 numbers vertically in a column, and also within a group (a 3x3 square).
After that, every time I had to take a flight I got addicted to buying a new puzzles magazine that would entertain me for the flight. On December 2006 while flying to Mexico I decided to change the tradition and instead build a simple Sudoku game that I could play any time I felt like doing it without having to find a magazine store and that turned into this simple game. It is not yet a great game since I haven't had time to finalize it, but I figure I would share it anyway in case someone finds it fun.
Click Here to go to the Download Page
More than a year ago I wrote about Microsoft.Web.Administration.dll and how it was a new API we were creating for managed code developers to be able to easily set any configuration settings of IIS, however I purposely ignored the configuration part of the API.
Later I talked about the way configuration was organized in IIS 7.0 and how configuration files inherited and worked.
Recently I was asked about some samples on how to modify IIS configuration and decided it was about time to talk about the configuration part of Microsoft.Web.Administration.
The first thing to really emphasize is that Microsoft.Web.Administration in a way has two different ways of reading configuration:
Whenever you work with the configuration system in IIS you need to:
The entire configuration in IIS is organized in sections inside configuration files. Sections are composed of elements, attributes, collections and potentially even methods. If you want to know what is the section you are looking you can search it in %windir%\System32\Inetsrv\Config\schema which is the folder where we place all the "schema" files that entirely describe the configuration in IIS.
The configuration systemin IIS 7.0 is distributed and as such each child object inherits the configuration of its parent, so for example an application inherits the configuration of the site and the site inherits configuration from the server. So now you need to decide which objects you want to manage, for example, do you want to enable Windows Authentication for the entire server or do you only want to enable it for a particular site or application.
As the previous bullet mentions the configuration system is distributed so now you can actually make the changes in different levels for the same object, for example you can modify applicationHost.config with a locationPath "Default Web Site" or you can obtain the same behavior by modifying a web.config file inside wwwroot directory. The concept that really impacts this decision is configuration locking since based on the settings that the server administrator has configured it might be invalid to set authentication in the web.config and might only be possible to set it in applicationHost.config.
OK, after all that talking lets go to the some actual examples and apply the 3 steps above.
All the code below assumes you have added a reference to Microsoft.Web.Administration.dll (located at %windir%\system32\inetsrv\) and that you are adding a "using Microsoft.Web.Administration;" at the top of your C# file.
The code uses ServerManager to get the web.config of the web site and then queries the directoryBrowse section and sets the attribute 'enabled' to true. If you open IIS_Schema.xml you will see that this section defines the 'enabled' attribute as a Boolean.
As you can see this API offers a loosely typed object model to ready and modify configuration, with the most important objects being Configuration, ConfigurationElement and ConfigurationAttribute.
In this case the handlers Section has a default collection which is where we want to add our handler. For that we use the CreateElement() method to get a new element that we can set the attributes and then add it.
Unfortunately currently there is no way to search collections so your only option is to iterate through elements and find the match you are looking for, in this case I'm matching by the name and then removing it from the collection.
Hopefully that should give a good initial steps on how to start working with configuration using Microsoft.Web.Administration, there are several other options I'll be mentioning in other post on how to lock configuration, how to set metadata, how to enumerate configurations and how to do much more advanced stuff for the few developers that will actually need advanced control of IIS configuration.
When I installed Windows Live Writer for the first time I was skeptical of having a different blog writer, so far I was very happy using Microsoft Word 2007 as my blog editor. However I decided to give it a try and see what I could get from it.
Now, I can only tell you that I love it, the biggest reason is because it comes with a simple API that allows you to extend it and add functionality to it. This approach of exposing the platform really makes me feel I can do everything, and if I can't, then I can just extend it to do what I need.
In my case every time I blog something that uses code, I always try to "colorize" it (personally find it easier to read when code is formatted using colors). That is the reason I wrote my original Code Colorizer application so that I could just paste the code in, and get the HTML that I would then tweak manually directly in the blog engine.
Now that I'm using Windows Live Writer, I decided to test-drive the extensibility model they expose and wrote my Code Colorizer for Windows Live Writer so that I don't need to hand edit anything.
The idea is that you just download the DLL into the Plugins (C:\Program Files\Windows Live\Writer\Plugins) directory, launch Live Writer and now, you will get a task "Insert Colorized Code..." that will show you a dialog where you can either type the code or just paste it and the right HTML will be inserted in your blog.
This for the first time makes it really easy for me to insert formatted code without the need to tweak anything by hand.
You can download it for free at: http://www.carlosag.net/Tools/WindowsLiveWriter/Default.aspx
The following image shows a snapshot of the tool in action:
During my last two business trips (to Barcelona for TechEd and Mexico for ReMix) I was way too bored on the plane and since I recently got my Motorola Q9 (which is a sweet Windows Mobile Phone) decided to write myself a Tetris game and to port my Sudoku game to Windows Mobile as a way to do my "first steps" in the .NET Compact Framework.
To my surprise it was really easy to write them and even more to port the desktop version of Sudoku to run in all the .NET Compact Framework platforms.
Since holidays are coming I thought of share them as a gift for this holiday's season.
Bottom line (with the risk of sounding like a marketing dude, which I'm not) .NET is a cool technology that makes it really easy to code for many devices, from high-end servers to hand held devices to mobile phones. In this case, I have tested these applications with a Pocket PC, Smartphone 2003, Windows Mobile 5 and Windows Mobile 6. And best of all, the code base is pretty much the same as the Desktop version.
You can install both games by browsing from your mobile device to http://www.carlosag.net/mobile/ where you will find instructions on how to install the .cab files, or just click the images below to go to the download page for each game.
Today I will be talking about one of the features included in the new IIS Admin Pack called Configuration Editor.
Configuration Editor is an IIS Manager feature that will let you managed any configuration section available in your configuration system. Configuration Editor exposes several features from configuration that are not exposed anywhere else in IIS Manager, including:
Please give us feedback on things you would like to see or change at the IIS Forums: http://forums.iis.net/1149.aspx
OK, but rather than keep with more and more text, I will just show you a video on how it looks and all its features (for those of you who like text, there is a transcript below).
So I have here Windows Vista SP 1 with the IIS Admin Pack installed, in my machine I have very few applications installed but should be good to show some of the features on config editor. When entering Config Editor, first thing you will notice is that at the top you have a drop-down list that shows all the sections currently schematized and ready to be used in your system.
Since this is sorted alphabetically, the first section that gets selected is AppSettings, for I can very easily switch between ASP.NET configuration sections, such as system.web/authentication, or the IIS configuration sections such as system.webServer/defaultDocument or the system.applicationHost/sites that contains all the sites configuration for IIS.
As you can see the user interface displays the configuration elements and properties of the section that is selected, providing you an easy way to see every single configuration property available in the system.
At the top you'll get a label specifying the deepest path where this section is being used relevant to your scope, so in this case its telling us that its been set in ApplicationHost.config. After that, all the elements and properties are shown in a Property Grid, that displaye elements as a collapsible set of properties. One of the interesting things is that we provide validation for the properties for example, when entering string characters in a numeric property type an error message will be displayed giving you the details of the expected types. Additionaly other benefits such as type editors, so that when editing a property of type boolean, you get the True/False drop-down, or when a property that is of type enumeration such as the LogFormat inside the SiteDefaults, you will get a drop-down list with only the list of options that are allowed for that enumeration. Same way, when editing a property of type flags such as the logExtFileFlags that contains the fields to include in the log file, you will get a multi-select drop-down list where you can select and de-select the different options. Also, you will notice that additional information is displayed as you select the different properties, giving you details of their data type as well as additional validations for those that have some, for example, the truncateSize property has specified that only a certain range is considered valid, if I type a value that is not within that range it will show this message giving me details of the problem.
Now, lets move to a simpler section so that we can show other features of the Configuration Editor. For example here in default documents, if I want to disable it I just change it to False and click Apply. As you would expect all the changes are applied and to see what changes this actually made in my system I'm going to show a Diff of the configuration that I have backed up and indeed the only change that happened in my configuration system is that it changed from true to false.
As you will notice there is a collection in this section, all the collections are shown in an advanced collection editor that will let you see all the information of the items on it, including the ability to add, remove and clear the collection, as well as lock individual items on it. It additionally shows where each of the individual items is coming from making it easier to understand the distributed configuration.
Another thing you will notice is that this collection editor shows some visual cues to help you deal with data, for example this little key here tells you that this property is the unique key of the collection item.
So lets actually add a new one, for that I just need to click Add and fill the values, in this case, lets add Home.aspx as a new default document. After doing that, I can close dialog and click Apply. And lets take a look at what happened to my configuration. As you can see the new item was added. So as you can see its really easy to see and change configuration in collections.
Another interesting feature is locking, for example if I want to make sure that my default documents are always enabled and no one else can override them, I can go here and select the enabled attribute and click lock attribute which will prevent it from being changed in any other web.config file.
Now, another interesting feature which is probably one of the most powerful features is the ability to search configuration so that you can see a high-level overview of the configuration system and all the web.config files on it. Just click Search Configuration. This shows me this dialog that shows me the root web.config that includes all the section that are being set on it, it also shows me applicationHost.config that includes again all the sections being used on it, as well as a location tag for a particular application that includes also some sections for it. you will notice that I also have a couple of applications that include web.config's in their folders, and sub-folders. where we can see how for example in this web.config it includes some
one of the neat features is that you can actually click any of this nodes and it will immediately display the content of the section as well as where its coming from. For example if I click the web.config my entire web.config is displayed, if I click a specific section it only displays the content of the section. I can even click the locationPath that I'm interested and only get that particular one.
Additionally you can easily search who is changing the authorization settings from asp.net and as easy as that you can see all the places in your server where the authorization settings are being set and quickly identify all the settings that are being used in your server. This feature is extremely useful because now, you can easily search for example default Document and make sure nobody is changing it and make sure no one else is violating the locking we just did.
It also allows you to see the files in a flat view where it gives you all the different paths and files where each of them is coming from. You get the exact functionality, its just a different visual representation of the config.
Another interesting thing is that if you want to build your own sections and extend our configuration system, you can go to the schema folder and write your own configuration section, declare it using our schema notation, here I'm just defining a section named mySection, that includes an attribute called enabled of type bool and an attribute called message of type string and an attribute password of type string that should be encrypted.. Now, I just need to edit applicationHost.config to define the section so that config system knows we are going to consume it . Just by doing that, now I can go back to config editor and refresh the window, and my section is now available in the drop down, and as you would expect it displays all of the properties I defined, and I can just go ahead and set them, and I get all the locking functionality, I get all the script generation, I get all the UI validation.
And if I apply, you will see that as expected the changes are done, the password attribute is encrypted, etc.
So as you can see configuration editor is an extremely powerful feature that will be really useful for successfully managing the web.config's in your system.
The other day somebody ask me if there was a way to limit the amount of work that Site Analysis in IIS SEO Toolkit would cause to the server. This is interesting for a couple of reasons,
In Beta 1 we do not support the Crawl-delay directive in the Robots exclusion protocol; in future versions we will look at adding support this setting. The good news is that in Beta 1 we do have a configurable setting that can help you achieve this goals called Maximum Number of Concurrent Requests that you can configure.
To set it: