In the past few days I've been reading a bit about SEO and trying to understand more about what makes a Web Site be SEO (Search-Engine-Optimized) and what are some of the typical headaches when trying to achieve that as well as how we can implement them in IIS.
Today I decided to post how you can make your Web Site running IIS 7.0 a bit "friendlier" to Search Engines without having to modify any code in your application. Being SEO is a big statement since it can include several things, so for now I will scope the discussion to 3 things that can be easily addressed using the IIS URL Rewrite Module:
Basically the goal of canonicalization is to ensure that the content of a page is only exposed as a unique URI. The reason this is important is because even though for humans it's easy to tell that http://www.carlosag.net is the same as http://carlosag.net, many search engines will not make any assumptions and keep them as two separate entries, potentially splitting the rankings of them lowering their relevance. Another example of this is http://www.carlosag.net/default.aspx and http://www.carlosag.net/. You can certainly minimize the impact of this by writing your application using the canonical forms of your links, for example in your links you can always link to the right content for example: http://www.carlosag.net/tools/webchart/ and remove the default.aspx, however that only accounts for part of the equation since you cannot assume everyone referencing your Web Site will follow this carefully, you cannot control their links.
This is when URL Rewrite comes into play and truly solves this problem.
URL Rewrite can help you redirect when the users type your URL in a way you don't unnecessarily want them to, for example just carlosag.net. Choosing between using WWW or not is a matter of taste but once you choose one you should ensure that you guide everyone to the right one. The following rule will automatically redirect everyone using just carlosag.net to www.carlosag.net. This configuration can be saved in the Web.config file in the root of your Web Site.Note that I'm only including the XML in this blog, however I used IIS Manager to generate all of these settings so you don't need to memorize the XML schema since the UI includes several friendly capabilities to generate all of these..
Note that one important thing is to use Permanent redirects (301) , this will ensure that if anybody links your page using a non-WWW link when the search engine bot crawls their Web Site it will identify the link as permanently moved and it will treat the new URL as the correct address and it will not index the old URL, which is the case when using Temporary (302) redirects. The following shows how the response of the server looks like:
IIS has a feature called Default Document that allows you to specify the content that should be processed when a user enters a URL that is mapped to a directory and not an actual file. In other words, if the user enters http://www.carlosag.net/tools/ then they will actually get the content as if they entered http://www.carlosag.net/tools/default.aspx. That is all great, the problem is that this feature only works one way by mapping a Directory to a File, however it does not map the File to the Document, this means that if some of your links or other users enter the full URL, then search engines will see two different URL's. To solve that problem we can use a configuration very similar to the rule above, following is a rule that will redirect the default.aspx to the canonical URL (the folder).
This again, uses a Permanent redirect to extract everything before Default.aspx and redirect it to the "parent" URL path, so for example, if the user enters http://www.carlosag.net/Tools/WindowsLiveWriter/default.aspx it will be redirected to http://www.carlosag.net/Tools/WindowsLiveWriter/ as well as http://www.carlosag.net/Tools/default.aspx to http://www.carlosag.net/Tools/. You can place this rule at the root of your site and it will take care of all the default documents (if you have a default.aspx in every folder)
Asking your user to remember that www.contoso.com/books.aspx?isbn=0735624410 is the URL for the IIS Resource Kit is not the nicest thing to do, first of all why do they care about this being an ASPX and the fact that it takes arguments and what not. It seems that providing them with a URL like www.contoso.com/books/IISResourceKit will truly resonate with them and be easier for them to remember and pass along. Most importantly it really doesn't tie you to any Web technology.
With URL Rewrite you can easily build this kind of logic automatically without having to modify your code using Rewrite Maps:
The configuration above includes a rule that uses a Rewrite Map to translate a URL like: http://www.contoso.com/books/IISResourceKit into http://www.contoso.com/books.aspx?isbn=0735624410 automatically. Using maps is a very convenient way to have a "table" of values that can be transformed into any other value to be used in the result URL. Of course there are better ways of doing this when using large catalogs or values that change frequently but is extremely useful when you have a consistent set of values or when you can't make changes to an existing application. Note that since we use Rewrite the end users never see the "ugly-URL" unless they knew it already and typed it, and of course this means you can use the inverse approach to ensure the canonicalization is preserved:
The rule above does the "inverse" by matching the URL books.aspx, extracting the ISBN query string value and doing a lookup in the ISBN table and redirecting the client to the canonical URL, so again if user enters http://www.contoso.com/books.aspx?isbn=0735624410 they will be redirected to http://www.contoso.com/books/IISResourceKit.
This Friendly URL to me is more of a user feature than a SEO feature, however I've read in every SEO guide to reduce the number of parameters in your Query String, however, I have not find yet any document that clearly states if there is truly a limit in the search engine bot's that would truly impact the search relevance. I guess it makes sense that they wouldn't keep track of thousands of links to a catalog.aspx that has zillions of permutations based on hundreds of values in the query string (category, department, price range, etc) even if all of them were linked, but again I don't have any prove.
One complex tasks that Web Developers face sometimes is trying to reorganize their current Web Site structure, whether its moving a section to a different path, or something as simple as renaming a single file, you need to take into consideration things like, Is this move a temporary thing?, How do I ensure old clients get the new URL?, How do I prevent losing the search engine relevance?. URL Rewrite will help you perform these tasks.
If you rename a file you can very easily just write a Rewrite or Redirect Rule that ensures that your users continue getting the content. If your intent is to never go back to the old name you should use a Redirect Permanent so everyone starts getting the new content with its new "Canonical URL", however, if this could be a temporary thing you should use a Redirect Temporary. Finally a Rewrite is useful if you still want both URL's to continue to be valid (though this breaks the canonicality).
Another common scenario is when you need to move an entire directory to another place of the Web Site. It could also be that based on some criteria (say Mobile browsers or other User Agent) get a different set of pages/images. Either way, URL rewrite helps with this. The following configuration will redirect every call to the /Images directory to the /NewImages directory.
A related scenario is if you wanted to show different smaller images whenever a user of Windows CE was accessing your site, you could have a "img" directory where all the small images are stored and use a rule like the following:
Note, that in this case the use of Rewrite makes sense since we want the small images to look as the original images to the browser and it will save a "round-trip" to it.
Another common operation is when you randomly need to relocate pages for whatever reason (such as Marketing Campaigns, Branding, etc). In this case if you have several files that have been moved or renamed you can have a single rule that catches all of those and redirects them accordingly. Similarly, another sample could include an incremental migration from one technology to another where say you are moving from Classic ASP to ASP.NET and as you rewrite some of the old ASP pages into ASPX pages you want to start serving them without breaking any links or the search engine relevance.
Now, you can just keep adding to this table any broken link and specify its new address.
Other potential use of URL Rewrite is when using RIA applications in the browser, whether using things like AJAX, Silverlight or Flash, that are not easy to parse and index by search engines, you could use URL Rewrite to rewrite the URL to static HTML versions of your content, however you should make sure that the content is consistent so you don't misguide users and search engines. For example the following rule will rewrite all the files in the RIAFiles table to their static HTML counterpart but only if the User Agent is the MSNBot or the GoogleBot:
Related to this is that you might want to prevent search engines from crawling certain files (or your entire site), for that, you can use the Robots.txt semantics and use a "disallow", however, you can also use URL Rewrite to prevent this with more functionality such as blocking only a specific user agent:
There are several other things you can do to ensure that your Web Site is friendly with Search Engines, however most of them require changes to your application, but certainly worth the effort, for example:
For this entry I read and used some of the resources at several Web Sites, including:
Every now and then after leaving my computer running for several weeks I would get a weird error message when trying to launch Excel saying something like:
C:\PROGRA~1\MICROS~1\Office12\EXCEL.EXE is not a valid Win32 application.
This file does not have a program associated with it for performing this action. Create an association in the Set Associations control panel.
I tried several things to make it run again, but only a restarting would solve the problem. Finally, I decided to investigate a bit more and turns out there is a fix that solves the problem that you can download from Microsoft support:
This update improves the reliability of Windows Vista SP1-based computers that experience issues in which large applications cannot run after the computer is turned on for extended periods of time. For example, when you try to start Excel 2007 after the computer is turned on for extended periods of time, a user may receive an error message that resembles the following:
EXCEL.EXE is not a valid Win32 application
I just installed it and so far so good, no more weird errors but I guess I need to wait a few weeks before I can testify it works. Either way I though this could be helpful for others.
Direct links for the fix download are:
Windows Vista, 32-bit versions Download the Update for Windows Vista (KB952709) package now. (http://www.microsoft.com/downloads/details.aspx?FamilyId=DF72A9B0-564E-4326-894E-05CBA709CB39) Windows Vista, 64-bit versions Download the Update for Windows Vista for x64-based Systems (KB952709) package now. (http://www.microsoft.com/downloads/details.aspx?FamilyId=C3536CAA-7B71-4525-9D23-21A5B3D4507F)
Today in the IIS.NET Forums a question was asked if it was possible to use the same IIS Manager Users authentication in the context of a Web Application so that you could have say something like WebDAV using the same credentials as you use when using IIS Manager Remote Administration.
The IIS Manager Remote Administration allows you to connect to manage your Web Site using credentials that are not Windows Users, but instead just a combination of User and Password. This is implemented following a Provider model where the default implementation we ship uses our Administration.config file (%windir%\system32\inetsrv\config\administration.config) as the storage for this users. However, you can easily implement a base class to authentication against a database or any other users store if needed. This means you can build your own application and call our API's (ManagementAuthentication).
Even better in the context of a Web Site running in IIS 7.0 you can actually implement this without having to write a single line of code.
Disclaimer: Administration.config out-of-the box only has permissions for administrators to be able to read the file. This means that a Web Application will not be able to access the file, so you need to change the ACL's in the file to provide read permissions for your Application, but you should make sure that you limit the read access to the minimum required such as below.
Here is how you do it:
What is also nice is that you can use URL Authorization to further restrict permissions in your pages for this users, for example, if I didn't want a particular IIS Manager User (say MyIisManagerUser) to access the Web Site I can just configure this in the same web.config:
If you want to learn more about remote administration and how to configure it you can read: http://learn.iis.net/page.aspx/159/configuring-remote-administration-and-feature-delegation-in-iis-7/
AHADMIN is the COM API that IIS 7.0 uses for reading and writing its configuration system. One of the not so well known features is that you can also use the same API to manage Administration.config by calling the SetMetadata method and specifying that you will be targeting Administration.config. What this ends up doing is using an IAppHostPathMapper built-in mapper that will re-map the files so that you can manage Administration.config easily.
Here is an example of a common operation that adds a ModuleProvider (UI Extensibility Module) and its corresponding Module.
I'll use this time to also mention that it is very important to add it to the Modules section in case you want your module to be used from Site and Application delegated connections, otherwise only Server connections will get them.