In the past few days I've been reading a bit about SEO and trying to understand more about what makes a Web Site be SEO (Search-Engine-Optimized) and what are some of the typical headaches when trying to achieve that as well as how we can implement them in IIS.
Today I decided to post how you can make your Web Site running IIS 7.0 a bit "friendlier" to Search Engines without having to modify any code in your application. Being SEO is a big statement since it can include several things, so for now I will scope the discussion to 3 things that can be easily addressed using the IIS URL Rewrite Module:
Basically the goal of canonicalization is to ensure that the content of a page is only exposed as a unique URI. The reason this is important is because even though for humans it's easy to tell that http://www.carlosag.net is the same as http://carlosag.net, many search engines will not make any assumptions and keep them as two separate entries, potentially splitting the rankings of them lowering their relevance. Another example of this is http://www.carlosag.net/default.aspx and http://www.carlosag.net/. You can certainly minimize the impact of this by writing your application using the canonical forms of your links, for example in your links you can always link to the right content for example: http://www.carlosag.net/tools/webchart/ and remove the default.aspx, however that only accounts for part of the equation since you cannot assume everyone referencing your Web Site will follow this carefully, you cannot control their links.
This is when URL Rewrite comes into play and truly solves this problem.
URL Rewrite can help you redirect when the users type your URL in a way you don't unnecessarily want them to, for example just carlosag.net. Choosing between using WWW or not is a matter of taste but once you choose one you should ensure that you guide everyone to the right one. The following rule will automatically redirect everyone using just carlosag.net to www.carlosag.net. This configuration can be saved in the Web.config file in the root of your Web Site.Note that I'm only including the XML in this blog, however I used IIS Manager to generate all of these settings so you don't need to memorize the XML schema since the UI includes several friendly capabilities to generate all of these..
Note that one important thing is to use Permanent redirects (301) , this will ensure that if anybody links your page using a non-WWW link when the search engine bot crawls their Web Site it will identify the link as permanently moved and it will treat the new URL as the correct address and it will not index the old URL, which is the case when using Temporary (302) redirects. The following shows how the response of the server looks like:
IIS has a feature called Default Document that allows you to specify the content that should be processed when a user enters a URL that is mapped to a directory and not an actual file. In other words, if the user enters http://www.carlosag.net/tools/ then they will actually get the content as if they entered http://www.carlosag.net/tools/default.aspx. That is all great, the problem is that this feature only works one way by mapping a Directory to a File, however it does not map the File to the Document, this means that if some of your links or other users enter the full URL, then search engines will see two different URL's. To solve that problem we can use a configuration very similar to the rule above, following is a rule that will redirect the default.aspx to the canonical URL (the folder).
This again, uses a Permanent redirect to extract everything before Default.aspx and redirect it to the "parent" URL path, so for example, if the user enters http://www.carlosag.net/Tools/WindowsLiveWriter/default.aspx it will be redirected to http://www.carlosag.net/Tools/WindowsLiveWriter/ as well as http://www.carlosag.net/Tools/default.aspx to http://www.carlosag.net/Tools/. You can place this rule at the root of your site and it will take care of all the default documents (if you have a default.aspx in every folder)
Asking your user to remember that www.contoso.com/books.aspx?isbn=0735624410 is the URL for the IIS Resource Kit is not the nicest thing to do, first of all why do they care about this being an ASPX and the fact that it takes arguments and what not. It seems that providing them with a URL like www.contoso.com/books/IISResourceKit will truly resonate with them and be easier for them to remember and pass along. Most importantly it really doesn't tie you to any Web technology.
With URL Rewrite you can easily build this kind of logic automatically without having to modify your code using Rewrite Maps:
The configuration above includes a rule that uses a Rewrite Map to translate a URL like: http://www.contoso.com/books/IISResourceKit into http://www.contoso.com/books.aspx?isbn=0735624410 automatically. Using maps is a very convenient way to have a "table" of values that can be transformed into any other value to be used in the result URL. Of course there are better ways of doing this when using large catalogs or values that change frequently but is extremely useful when you have a consistent set of values or when you can't make changes to an existing application. Note that since we use Rewrite the end users never see the "ugly-URL" unless they knew it already and typed it, and of course this means you can use the inverse approach to ensure the canonicalization is preserved:
The rule above does the "inverse" by matching the URL books.aspx, extracting the ISBN query string value and doing a lookup in the ISBN table and redirecting the client to the canonical URL, so again if user enters http://www.contoso.com/books.aspx?isbn=0735624410 they will be redirected to http://www.contoso.com/books/IISResourceKit.
This Friendly URL to me is more of a user feature than a SEO feature, however I've read in every SEO guide to reduce the number of parameters in your Query String, however, I have not find yet any document that clearly states if there is truly a limit in the search engine bot's that would truly impact the search relevance. I guess it makes sense that they wouldn't keep track of thousands of links to a catalog.aspx that has zillions of permutations based on hundreds of values in the query string (category, department, price range, etc) even if all of them were linked, but again I don't have any prove.
One complex tasks that Web Developers face sometimes is trying to reorganize their current Web Site structure, whether its moving a section to a different path, or something as simple as renaming a single file, you need to take into consideration things like, Is this move a temporary thing?, How do I ensure old clients get the new URL?, How do I prevent losing the search engine relevance?. URL Rewrite will help you perform these tasks.
If you rename a file you can very easily just write a Rewrite or Redirect Rule that ensures that your users continue getting the content. If your intent is to never go back to the old name you should use a Redirect Permanent so everyone starts getting the new content with its new "Canonical URL", however, if this could be a temporary thing you should use a Redirect Temporary. Finally a Rewrite is useful if you still want both URL's to continue to be valid (though this breaks the canonicality).
Another common scenario is when you need to move an entire directory to another place of the Web Site. It could also be that based on some criteria (say Mobile browsers or other User Agent) get a different set of pages/images. Either way, URL rewrite helps with this. The following configuration will redirect every call to the /Images directory to the /NewImages directory.
A related scenario is if you wanted to show different smaller images whenever a user of Windows CE was accessing your site, you could have a "img" directory where all the small images are stored and use a rule like the following:
Note, that in this case the use of Rewrite makes sense since we want the small images to look as the original images to the browser and it will save a "round-trip" to it.
Another common operation is when you randomly need to relocate pages for whatever reason (such as Marketing Campaigns, Branding, etc). In this case if you have several files that have been moved or renamed you can have a single rule that catches all of those and redirects them accordingly. Similarly, another sample could include an incremental migration from one technology to another where say you are moving from Classic ASP to ASP.NET and as you rewrite some of the old ASP pages into ASPX pages you want to start serving them without breaking any links or the search engine relevance.
Now, you can just keep adding to this table any broken link and specify its new address.
Other potential use of URL Rewrite is when using RIA applications in the browser, whether using things like AJAX, Silverlight or Flash, that are not easy to parse and index by search engines, you could use URL Rewrite to rewrite the URL to static HTML versions of your content, however you should make sure that the content is consistent so you don't misguide users and search engines. For example the following rule will rewrite all the files in the RIAFiles table to their static HTML counterpart but only if the User Agent is the MSNBot or the GoogleBot:
Related to this is that you might want to prevent search engines from crawling certain files (or your entire site), for that, you can use the Robots.txt semantics and use a "disallow", however, you can also use URL Rewrite to prevent this with more functionality such as blocking only a specific user agent:
There are several other things you can do to ensure that your Web Site is friendly with Search Engines, however most of them require changes to your application, but certainly worth the effort, for example:
For this entry I read and used some of the resources at several Web Sites, including:
Today there was a question in the IIS.net Forums asking how to expose two different Internet sites from another site making them look like if they were subdirectories in the main site.
So for example the goal was to have a site: www.site.com expose a www.site.com/company1 and a www.site.com/company2 and have the content from “www.company1.com” served for the first one and “www.company2.com” served in the second one. Furthermore we would like to have the responses cached in the server for performance reasons. The following image shows a simple diagram of this:
This sounds easy since its just about routing or proxying every single request to the correct servers, right? Wrong!!! If it only it was that easy. Turns out the most challenging thing is that in this case we are modifying the structure of the underlying URLs and the original layout in the servers which makes relative paths break and of course images, Stylesheets (css), javascripts and other resources are not shown correctly.
To try to clarify this, imagine that a user requests using his browser the page at http://www.site.com/company1/default.aspx, and so based on the specification above the request is proxied/routed to http://www.company1.com/default.aspx on the server-side. So far so good, however, imagine that the markup returned by this HTML turns out to have an image tag like “<img src=/some-image.png />”, well the problem is that now the browser will resolve that relative path using the base path on the original request he made which was http://www.site.com/company1/default.aspx resulting in a request for the image at http://www.site.com/some-image.png instead of the right “company1” folder that would be http://www.site.com/company1/some-image.png .
Do you see it? Basically the problem is that any relative path or for that matter absolute paths as well need to be translated to the new URL structure imposed by the original goal.
So how do we do it then?
URL Rewrite 2.0 includes the ability to rewrite the content of a response as it is getting served back to the client which will allow us to rewrite those links without having to touch the actual application.
Software Required:
Steps
The first rule is an inbound rewrite rule that basically captures all the requests to the root folder /company1/*, so if using Default Web Site, anything going to http://localhost/company1/* will be matched by this rule and it will rewrite it to www.company1.com respecting the HTTP vs HTTPS traffic.
One thing to highlight which is what took me a bit of time is the “serverVariables” entry in that rule that basically is overwriting the Accept-Encoding header, the reason I do this is because if you do not remove that header then the response will likely be compressed (Gzip or deflate) and Output Rewriting is not supported on that case, and you will end up with an error message like:
HTTP Error 500.52 - URL Rewrite Module Error. Outbound rewrite rules cannot be applied when the content of the HTTP response is encoded ("gzip").
Also note that to be able to use this feature for security reasons you need to explicitly enable this by allowing the server variable. See enabling server variables here.
The last two rules just rewrite the links and scripts and other resources so that the URLs are translated to the right structure. The first one rewrites absolute paths, and the last one rewrites the relative paths. Note that if you use relative paths using “..” this will not work, but you can easily fix the rule above, I was too lazy to do that and since I never use those when I create a site it works for me :)
A huge added value of using ARR is that now we can with a couple of clicks enable disk caching so that the requests are cached locally in the www.site.com, so that not every single request ends up paying the price to go to the backend servers.
As easy as that now you will see caching working and your site will act as a container of other servers in the internet. Pretty cool hah! :)
So in this post we saw how with literally few lines of XML, URL Rewrite and ARR we were able to enable a proxy/routing scenario with the ability to rewrite links and furthermore with caching support.
The other day I was asked if I knew about a tool that would allow users to easily analyze the IIS Log Files, to process and look for specific data that could easily be automated. My recommendation was that if they were comfortable with using a SQL-like language that they should use Log Parser. Log Parser is a very powerful tool that provides a generic SQL-like language on top of many types of data like IIS Logs, Event Viewer entries, XML files, CSV files, File System and others; and it allows you to export the result of the queries to many output formats such as CSV (Comma-Separated Values, etc), XML, SQL Server, Charts and others; and it works well with IIS 5, 6, 7 and 7.5.
To use it you just need to install it and use the LogParser.exe that is found in its installation directory (on my x64 machine it is located at: C:\Program Files (x86)\Log Parser 2.2).
I also thought on sharing some of my favorite queries. To run them, just execute LogParser.exe and make sure to specify that the input is an IIS Log file (-i:W3C) and for ease of use in this case we will export to a CSV file that can be then opened in Excel (-o:CSV) for further analysis:
A final note: any time you deal with Date and Time, remember to use the TO_LOCALTIME function to convert the log times to your local time, otherwise you will find it very confusing when your entries seem to be reported incorrectly.
If you need any help you can always visit the Log Parser Forums to find more information or ask specific questions.
Any other useful queries I missed?
I'm really exited to announce that today we released the Technical Preview of the IIS Admin Pack and it includes 7 new features for IIS Manager that will help you in a bunch of different scenarios.
You can download the IIS 7.0 Admin Pack Technical Preview from (It requires less than 1MB):
(x86) http://www.iis.net/downloads/default.aspx?tabid=34&g=6&i=1646 (x64) http://www.iis.net/downloads/default.aspx?tabid=34&g=6&i=1647
http://learn.iis.net/page.aspx/401/using-the-administration-pack/
These UI modules include the following features:
Please, help us, we want to ask for your help on trying them and give us feedback of all these modules, do they work for you? what would you change? what would you add? What features are we missing?
Some things to think about,
Database Manager, what other database features are critical for you to build applications?
IIS Reports set of reports, what reports would you find useful?, would you want to have Configuration based reports (such as summarizing the Sites and their configuration, which configuration)? More Security Reports (such as)?
Configuration Editor, is it easy to use?, what concepts from configuration would you like to see?, etc
Given that each individual feature above has a lot of interesting features that can easily be missed, or might be confusing, I will be blogging in the near feature talking about why we decided to build each feature, what makes them different from any other thing you've seen as well as how you can make the most out of each of them.
Carlos
Last Wednesday we released the IIS Manager 7.0 client for Windows XP SP2, Windows Server 2003 and Windows Vista SP1. This is basically the IIS 7.0 Manager GUI that provides the ability to connect remotely to a Windows Server 2008 running the Web Management Service (WMSVC) to manage IIS 7.0 remotely.
There are several key differences in this version of IIS Manager and its remote infrastructure:
1) It allows for the first time users without administrative privileges to connect and manage their web sites and applications remotely
2) It runs over SSL, no more DCOM, which makes this a firewall friendly feature easy to setup.
3) Runs as a smart client, which means if a new feature is installed on the server it will automatically download the updated versions to the client machines.
You can download it from:
IIS.NET Web Site x86: http://www.iis.net/downloads/default.aspx?tabid=34&i=1626&g=6
x64: http://www.iis.net/downloads/default.aspx?tabid=34&i=1633&g=6
Microsoft.com/Downloads http://www.microsoft.com/downloads/details.aspx?FamilyID=32c54c37-7530-4fc0-bd20-177a3e5330b7&displaylang=en
To learn more about remote management and how to install it: http://learn.iis.net/page.aspx/159/configuring-remote-administration-and-feature-delegation-in-iis-7/
Now, to really show you what this is, I created a very simple demo that briefly shows the remote management capabilities over SSL. (Below there is a transcript in case my accent makes it difficult to understand my english :))
Transcript:
The purpose of this demonstration is to show you how easy it is to manage IIS 7.0 running in Windows Server 2008, from any machine that has Windows XP or Windows 2003 or Windows Vista by downloading the IIS Manager 7.0 that runs on all of those platforms.
Now, today I am not going to focus on the details of how to configure it and how to setup the server to support remote management, but mainly just focus on the client aspect.
On of the most interesting aspects of this remote management infrastructure is that it now uses an architecture that uses HTTPS to communicate to the server making this a nice firewall friendly remote management feature. Another key feature of this functionality is that it allows users without administrative privileges to connect and manage their Web Sites or their applications in a delegated way, where an administrator can restrict which options they can modify or not.
OK, so to show you this I have here a Windows Server 2008 installed with IIS 7.0, and as you would expect I can manage it locally quite easily using IIS Manager, whether its adding a Web Site or managing the configuration from both IIS or ASP.NET I can do it here.
This is all good, but now turns out I don’t want to connect locally but instead be able to remotely from my development machine connect to the server and still be able to do that and have the same experience as if I was locally logged on to the machine.
To show this, I have here a Virtual PC image running a clean install of Windows XP SP2, the only thing it has installed additionally is the .NET Framework 2.0 which is the only requirement for the installation of IIS Manager 7.
I have already downloaded the IIS Manager installer which takes only about 3MB of disk, that you can find at http://www. iis.net or http://Microsoft.com/downloads.
Installing it is really simple and fast, just double click the icon and click next…
Once installed I can now connect to any machine running Windows Server 2008 that has been configured to support remote management. To do that I just need to choose the option “Connect To Server/Site/Application” from the File Menu or the Start Page.
Today, I will not drill down on the multiple differences between these connections, so for now I will just show how you can connect and manage the entire server by using a Windows Administrator account.
Another interesting feature of the remote management platform is that if some new feature built on top of the UI Management extensibility API is installed on the server, when I connect again to the server, it will automatically prompt me if I want to get the new functionliaty and I can choose which features to install or not.
To summarize, the IIS Manager 7 for Windows XP SP2, 2003 SP1 and Vista SP1 is available now, it only depends on the .NET FX 2.0 and it will allow you to connect to a remote server to manage it and have the same rich experience as if you were locally but using its new SSL remoting architecture.
IIS 7.0 includes a very cool feature that is not so well known called Hostable WebCore (HWC). This feature basically allows you to host the entire IIS functionality within your own process. This gives you the power to implement scenarios where you can customize entirely the functionality that you want "your Web Server" to expose, as well as control the lifetime of it without impacting any other application running on the site. This provides a very nice model for automating tests that need to run inside IIS in a more controlled environment.
This feature is implemented in a DLL called hwebcore.dll, that exports two simple methods:
The real trick for this feature is to know exactly what you want to support and "craft" the IIS Server configuration needed for different workloads and scenarios, for example:
An interesting thing to mention is that the file passed to ApplicationHostConfigPath parameter is live, in the sense that if you change the configuration settings your "in-process-IIS" will pick up the changes and apply them as you would expect to. In fact even web.config's in the site content or folder directories will be live and you'll get the same behavior.
To show how easy this can be done I wrote a small simple class to be able to run it easily from managed code. To consume this, you just have to do something like:
This will start your very own "copy" of IIS running in your own process, this means that you can control which features are available as well as the site and applications inside it without messing with the local state of the machine.
A very interesting thing is that it will even run without administrator privileges, meaning any user in the machine can start this program and have a "web server" of their own, that they can recycle, start and stop at their own will. (Note that this non-administrative feature requires Vista SP1 or Windows Server 2008, and it only works if the binding will be a local binding, meaning no request from outside the machine).
You can download the entire sample which includes two configurations: 1) one that runs only an anonymous static file web server that can only download HTML and other static files, and 2) one that is able to run ASP.NET pages as well.
Download the entire sample source code (9 kb)
You might be asking why would I even care to have my own IIS in my executable and not just use the real one? Well there are several scenarios for this:
In future posts I intent to share more samples that showcase some of this cool stuff.
IIS 7.0 Hostable WebCore feature allows you to host a "copy" of IIS in your own process. This is not your average "HttpListener" kind of solution where you will need to implement all the functionality for File downloads, Basic/Windows/Anonymous Authentication, Caching, Cgi, ASP, ASP.NET, Web Services, or anything else you need; Hostable WebCore will allow you to configure and extend in almost any way the functionality of your own Web Server without having to build any code.
I was running out of disk space in C: and was unable to install a small software that I needed, so I decided to clean up a bit. For that I like using WinDirStat http://windirstat.info/ which very quickly allows you to find where the big files/folders are. In this case I found that my c:\Windows\winsxs folder was over 12 GB of size. One way to reclaim some of that disk space is to cleanup all files that have been backed up when a Service Pack has been installed. To do that in Windows 7 you can run the following DISM command:
dism /online /cleanup-image /spsuperseded /hidesp
That freed up 4 GB in my machine and now I can move on.
Disclaimer: I only ran this in my Windows 7 machine and it worked great, have not tried it in Server SKUs so run at your own risk.