In the URL Rewrite forum somebody posted the question "are redirects bad for search engine optimization?". The answer is: not necessarily, Redirects are an important tool for Web sites and if used in the right context they actually are a required tool. But first a bit of background.
A redirect in simple terms is a way for the server to indicate to a client (typically a browser) that a resource has moved and they do this by the use of an HTTP status code and a HTTP location header. There are different types of redirects but the most common ones used are:
Below is an example on the response sent from the server when requesting http://www.microsoft.com/SQL/
One of the most important factors in SEO is the concept called organic linking, in simple words it means that your page gets extra points for every link that external Web sites have linking to your page. So now imagine the Search Engine Bot is crawling an external Web site and finds a link pointing to your page (example.com/some-page) and when it tries to visit your page it runs into a redirect to another location (say example.com/somepage). Now the Search Engine has to decide if it should add the original "some-page" into its index as well as if it should "add the extra points" to the new location or to the original location, or if it should just ignore it entirely. Well the answer is not that simple, but a simplification of it could be:
IIS Search Optimization Toolkit has a couple of rules that look for different patterns related to Redirects. The Beta version includes the following:
So how does it look like? In the image below I ran Site Analysis against a Web site and it found a few of these violations (2 and 3).
Notice that when you double click the violations it will tell you the details as well as give you direct access to the related URL's so that you can look at the content and all the relevant information about them to make the decision. From that menu you can also look at which other pages are linking to the different pages involved as well as launch it in the browser if needed.
Similarly with all the other violations it tries to explain the reason it is being flagged as well as recommended actions to follow for each of them.
IIS Search Engine Optimization Toolkit can also help you find all the different types of redirects and the locations where they are being used in a very easy way, just select Content->Status Code Summary in the Dashboard view and you will see all the different HTTP Status codes received from your Web site. Notice in the image below how you can see the number of redirects (in this case 18 temporary redirects and 2 permanent redirects). You can also see how much content they accounted for, in this case about 2.5 kb (Note that I've seen Web sites generate a large amount of useless content in redirect traffic, speaking of spending in bandwidth). You can double click any of those rows and it will show you the details of the URL's that returned that and from there you can see who links to them, etc.
So going back to the original question: "are redirects bad for Search Engine Optimization?". Not necessarily, they are an important tool used by Web application for many reasons such as:
Just make sure you don't abuse them by having redirects to redirects, unnecessary redirects, infinite loops, and use the right semantics.
During this PDC I attended Ian's presentation about WPF and Silverlight where he demonstrated the high degree of compatibility that can be achieved between a WPF desktop application and a Silverlight application. One of the differences that he demonstrated was when your application consumed Web Services since Silverlight applications execute in a sandboxed environment they are not allowed to call random Web Services or issue HTTP requests to servers that are not the originating server, or a server that exposes a cross-domain manifest stating that it is allowed to be called by clients from that domain.
Then he moved to show how you can work around this architectural difference by writing your own Web Service or HTTP end-point that basically gets the request from the client and using code on the server just calls the real Web Service. This way the client sees only the originating server and it allows the call to succeed, and the server can freely call the real Web Service. Funny enough while searching for a Quote Service I ran into an article from Dino Esposito in MSDN magazine where he explains the same issue and also exposes a "Compatibility Layer" which again is just code (more than 40 lines of code) to act as proxy to call a Web Service (except he uses the JSON serializer to return the values).
The obvious disadvantage is that this means you have to write code that only forwards the request and returns the response acting essentially as a proxy. Of course this can be very simple, but if the Web Service you are trying to call has any degree of complexity where custom types are being sent around, or if you actually need to consume several methods exposed by it, then it quickly becomes a big maintenance nightmare trying to keep them in sync when they change and having to do error handling properly, as well as dealing with differences when reporting network issues, soap exceptions, http exceptions, etc.
So after looking at this, I immediately thought about ARR (Application Request Routing) which is a new extension for IIS 7.0 (see http://www.iis.net/extensions) that you can download for free from IIS.NET for Windows 2008, that among many other things is capable of doing this kind of routing without writing a single line of code.
This blog tries to show how easy it is to implement this using ARR. Here are the steps to try this: (below you can find the software required), note that if you are only interested in what is really new just go to 'Enter ARR' section below to see the configuration that fix the Web Service call.
Message: Unhandled Error in Silverlight 2 Application An exception occurred during the operation, making the result invalid.
One of the features offered by ARR is to provide proxy functionality to forward requests to another server. One of the scenarios where this functionality is useful is when using it from clients that cannot make calls directly to the real data, this includes Silverlight, Flash and AJAX applications. As shown in this blog, by just using a few lines of XML configuration you can enable clients to call services in other domains without having to write hundreds of lines of code for each method. It also means that I get the original data and that if the WSDL were to change I do not need to update any wrappers. Additionally if using REST based services you could use local caching in your server relying on Output Cache and increase the performance of your applications significantly (again with no code changes).
Here is the software I installed to do this sample(amazing that all of it is completely free):
With the upcoming release of .NET 3.5 and LINQ I thought it would be interesting to show some of the cool things you can do with IIS 7 and LINQ. Everything that I will do can be done with C# 2.0 code but it would take me several lines of code to write them but thanks to LINQ you can do them in about a line or two.
Let's start with a very basic example that does not use LINQ but just M.W.A (Microsoft.Web.Administration) and then start adding interesting things to it.
The following code just iterates the sites in IIS and displays their name.
The other day I was asked if I knew about a tool that would allow users to easily analyze the IIS Log Files, to process and look for specific data that could easily be automated. My recommendation was that if they were comfortable with using a SQL-like language that they should use Log Parser. Log Parser is a very powerful tool that provides a generic SQL-like language on top of many types of data like IIS Logs, Event Viewer entries, XML files, CSV files, File System and others; and it allows you to export the result of the queries to many output formats such as CSV (Comma-Separated Values, etc), XML, SQL Server, Charts and others; and it works well with IIS 5, 6, 7 and 7.5.
To use it you just need to install it and use the LogParser.exe that is found in its installation directory (on my x64 machine it is located at: C:\Program Files (x86)\Log Parser 2.2).
I also thought on sharing some of my favorite queries. To run them, just execute LogParser.exe and make sure to specify that the input is an IIS Log file (-i:W3C) and for ease of use in this case we will export to a CSV file that can be then opened in Excel (-o:CSV) for further analysis:
A final note: any time you deal with Date and Time, remember to use the TO_LOCALTIME function to convert the log times to your local time, otherwise you will find it very confusing when your entries seem to be reported incorrectly.
If you need any help you can always visit the Log Parser Forums to find more information or ask specific questions.
Any other useful queries I missed?
In my spare time I’ve been thinking about new ideas for the SEO Toolkit, and it occurred to me that rather than continuing trying to figure out more reports and better diagnostics against some random fake sites, that it could be interesting to ask openly for anyone that is wanting a free SEO analysis report of your site and test drive some of it against real sites.
So if you want in, just post me your URL in the comments of this blog (make sure you are reading this blog from a URL inside http://blogs.msdn.com/carlosag/ , otherwise you might be posting comments in some syndicating site.), I will only allow the first few sites (if successful I will start another batch in the future) and I will be doing one by one in the following days. Make sure to include a way to contact you whether using the MSDN user infrastructure or include an email so that I can contact you with the results.
Alternatively I will take also URLs using Twitter at http://twitter.com/CarlosAguilarM so hurry up and let me know if you want me to look at your site.
I was running out of disk space in C: and was unable to install a small software that I needed, so I decided to clean up a bit. For that I like using WinDirStat http://windirstat.info/ which very quickly allows you to find where the big files/folders are. In this case I found that my c:\Windows\winsxs folder was over 12 GB of size. One way to reclaim some of that disk space is to cleanup all files that have been backed up when a Service Pack has been installed. To do that in Windows 7 you can run the following DISM command:
dism /online /cleanup-image /spsuperseded /hidesp
That freed up 4 GB in my machine and now I can move on.
Disclaimer: I only ran this in my Windows 7 machine and it worked great, have not tried it in Server SKUs so run at your own risk.
IIS 7.0 includes a very cool feature that is not so well known called Hostable WebCore (HWC). This feature basically allows you to host the entire IIS functionality within your own process. This gives you the power to implement scenarios where you can customize entirely the functionality that you want "your Web Server" to expose, as well as control the lifetime of it without impacting any other application running on the site. This provides a very nice model for automating tests that need to run inside IIS in a more controlled environment.
This feature is implemented in a DLL called hwebcore.dll, that exports two simple methods:
The real trick for this feature is to know exactly what you want to support and "craft" the IIS Server configuration needed for different workloads and scenarios, for example:
An interesting thing to mention is that the file passed to ApplicationHostConfigPath parameter is live, in the sense that if you change the configuration settings your "in-process-IIS" will pick up the changes and apply them as you would expect to. In fact even web.config's in the site content or folder directories will be live and you'll get the same behavior.
To show how easy this can be done I wrote a small simple class to be able to run it easily from managed code. To consume this, you just have to do something like:
This will start your very own "copy" of IIS running in your own process, this means that you can control which features are available as well as the site and applications inside it without messing with the local state of the machine.
A very interesting thing is that it will even run without administrator privileges, meaning any user in the machine can start this program and have a "web server" of their own, that they can recycle, start and stop at their own will. (Note that this non-administrative feature requires Vista SP1 or Windows Server 2008, and it only works if the binding will be a local binding, meaning no request from outside the machine).
You can download the entire sample which includes two configurations: 1) one that runs only an anonymous static file web server that can only download HTML and other static files, and 2) one that is able to run ASP.NET pages as well.
Download the entire sample source code (9 kb)
You might be asking why would I even care to have my own IIS in my executable and not just use the real one? Well there are several scenarios for this:
In future posts I intent to share more samples that showcase some of this cool stuff.
IIS 7.0 Hostable WebCore feature allows you to host a "copy" of IIS in your own process. This is not your average "HttpListener" kind of solution where you will need to implement all the functionality for File downloads, Basic/Windows/Anonymous Authentication, Caching, Cgi, ASP, ASP.NET, Web Services, or anything else you need; Hostable WebCore will allow you to configure and extend in almost any way the functionality of your own Web Server without having to build any code.
I have just uploaded a new application that extends IIS Manager 7 for Windows Vista and Windows Longhorn Server that adds a new Reports option that gives you a few reports of the server and site activity. Its features include:
Click Here to go to the Download Page
I'm working on a second version that will allow you to create your own queries and configure more options like Chart settings, and ore.
If you have any suggestions on reports that would be useful feel free to add them as comment to the post.
InetMgr exposes several extensibility points that developers can use to plug-in their own features and make them look and feel just as the built-in functionality. One of those extensibility features is the hierarchy tree view and is exposed mainly through three classes:
To extend the Tree view to add your own set of nodes or context menu tasks, developers need to perform the following actions:
Tasks illustrated in this walkthrough include:
HierarchyProvider is the base class that developers need to inherit from in order to get calls from the UI whenever a node needs to be loaded. This way they can choose to add nodes or tasks to the HierarchyInfo node that is passed as an argument.
The code above creates a class derived from HierarchyProvider that implements the base GetChildren method verifying that the node that is being expanded is a ServerConnection; if that is the case it returns an instance of a DemoHierarchyInfo node that will be added to that connection. The class DemoHierarchyInfo simply specifies its NodeType (a non-localized string that identifies the type of this node), SupportsChildren (false so that the + sign is not offered in tree view) and Text (the localized text that will be displayed in the tree view). Finally it overrides the OnSelected method and performs navigation to the DemoPage as needed.
In this task we will register the hierarchy provider created in the previous task so that the HierarchyService starts calling this type to extend the tree view.
To test the feature
In this lab, you learned how to extend the tree view to customize any node on it and add your own nodes to it. You can also override the GetTasks method to provide context menu tasks for existing nodes, and you can also override the SyncSelection method to customize the way synchronization of navigation works.
One question that I've been asked several times is: "Is it possible to schedule the IIS SEO Toolkit to run automatically every night?". Other related questions are: "Can I automate the SEO Toolkit so that as part of my build process I'm able to catch regressions on my application?", or "Can I run it automatically after every check-in to my source control system to ensure no links are broken?", etc.
The good news is that the answer is YES!. The bad news is that you have to write a bit of code to be able to make it work. Basically the SEO Toolkit includes a Managed code API to be able to start the analysis just like the User Interface does, and you can call it from any application you want using Managed Code.
In this blog I will show you how to write a simple command application that will start a new analysis against the site provided in the command line argument and process a few queries after finishing.
The most important type included is a class called WebCrawler. This class takes care of all the process of driving the analysis. The following image shows this class and some of the related classes that you will need to use for this.
The WebCrawler class is initialized through the configuration specified in the CrawlerSettings. The WebCrawler class also contains two methods Start() and Stop() which starts the crawling process in a set of background threads. With the WebCrawler class you can also gain access to the CrawlerReport through the Report property. The CrawlerReport class represents the results (whether completed or in progress) of the crawling process. It has a method called GetUrls() that returns an instance to all the UrlInfo items. A UrlInfo is the most important class that represents a URL that has been downloaded and processed, it has all the metadata such as Title, Description, ContentLength, ContentType, and the set of Violations and Links that it includes.
If you are not using Visual Studio, you can just save the contents above in a file, call it SEORunner.cs and compile it using the command line:
C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe /r:"c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll" /optimize+ SEORunner.cs
After that you should be able to run SEORunner.exe and pass the URL of your site as a argument, you will see an output like:
Processed - Remaining - Download Size
56 - 149 - 0.93 MB
127 - 160 - 2.26 MB
185 - 108 - 3.24 MB
228 - 72 - 4.16 MB
254 - 48 - 4.98 MB
277 - 36 - 5.36 MB
295 - 52 - 6.57 MB
323 - 25 - 7.53 MB
340 - 9 - 8.05 MB
358 - 1 - 8.62 MB
362 - 0 - 8.81 MB
Start URL: http://www.carlosag.net/
Start Time: 11/16/2009 12:16:04 AM
End Time: 11/16/2009 12:16:15 AM
Status Code summary
OK - 319
MovedPermanently - 17
Found - 23
NotFound - 2
InternalServerError - 1
The most interesting method above is RunAnalysis, it creates a new instance of the CrawlerSettings and specifies the start URL. Note that it also specifies that we should consider internal all the pages that are hosted in the same directory or subdirectories. We also set the a unique name for the report and use the same directory as the IIS SEO UI uses so that opening IIS Manager will show the reports just as if they were generated by it. Then we finally call Start() which will start the number of worker threads specified in the WebCrawler::WorkerCount property. We finally just wait for the WebCrawler to be done by querying the IsRunning property.
The remaining methods just leverage LINQ to perform a few queries to output things like a report aggregating all the URLs processed by Status code and more.
As you can see the IIS SEO Toolkit crawling APIs allow you to easily write your own application to start the analysis against your Web site which can be easily integrated with the Windows Task Scheduler or your own scripts or build system to easily allow for continuous integration.
Once the report is saved locally it can then be opened using IIS Manager and continue further analysis as with any other report. This sample console application can be scheduled using the Windows Task Scheduler so that it can run every night or at any time. Note that you could also write a few lines of PowerShell to automate it without the need of writing C# code and do that by only command line, but that is left for another post.