One question that I've been asked several times is: "Is it possible to schedule the IIS SEO Toolkit to run automatically every night?". Other related questions are: "Can I automate the SEO Toolkit so that as part of my build process I'm able to catch regressions on my application?", or "Can I run it automatically after every check-in to my source control system to ensure no links are broken?", etc.
The good news is that the answer is YES!. The bad news is that you have to write a bit of code to be able to make it work. Basically the SEO Toolkit includes a Managed code API to be able to start the analysis just like the User Interface does, and you can call it from any application you want using Managed Code.
In this blog I will show you how to write a simple command application that will start a new analysis against the site provided in the command line argument and process a few queries after finishing.
The most important type included is a class called WebCrawler. This class takes care of all the process of driving the analysis. The following image shows this class and some of the related classes that you will need to use for this.
The WebCrawler class is initialized through the configuration specified in the CrawlerSettings. The WebCrawler class also contains two methods Start() and Stop() which starts the crawling process in a set of background threads. With the WebCrawler class you can also gain access to the CrawlerReport through the Report property. The CrawlerReport class represents the results (whether completed or in progress) of the crawling process. It has a method called GetUrls() that returns an instance to all the UrlInfo items. A UrlInfo is the most important class that represents a URL that has been downloaded and processed, it has all the metadata such as Title, Description, ContentLength, ContentType, and the set of Violations and Links that it includes.
If you are not using Visual Studio, you can just save the contents above in a file, call it SEORunner.cs and compile it using the command line:
C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe /r:"c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll" /optimize+ SEORunner.cs
After that you should be able to run SEORunner.exe and pass the URL of your site as a argument, you will see an output like:
Processed - Remaining - Download Size
56 - 149 - 0.93 MB
127 - 160 - 2.26 MB
185 - 108 - 3.24 MB
228 - 72 - 4.16 MB
254 - 48 - 4.98 MB
277 - 36 - 5.36 MB
295 - 52 - 6.57 MB
323 - 25 - 7.53 MB
340 - 9 - 8.05 MB
358 - 1 - 8.62 MB
362 - 0 - 8.81 MB
Start URL: http://www.carlosag.net/
Start Time: 11/16/2009 12:16:04 AM
End Time: 11/16/2009 12:16:15 AM
Status Code summary
OK - 319
MovedPermanently - 17
Found - 23
NotFound - 2
InternalServerError - 1
The most interesting method above is RunAnalysis, it creates a new instance of the CrawlerSettings and specifies the start URL. Note that it also specifies that we should consider internal all the pages that are hosted in the same directory or subdirectories. We also set the a unique name for the report and use the same directory as the IIS SEO UI uses so that opening IIS Manager will show the reports just as if they were generated by it. Then we finally call Start() which will start the number of worker threads specified in the WebCrawler::WorkerCount property. We finally just wait for the WebCrawler to be done by querying the IsRunning property.
The remaining methods just leverage LINQ to perform a few queries to output things like a report aggregating all the URLs processed by Status code and more.
As you can see the IIS SEO Toolkit crawling APIs allow you to easily write your own application to start the analysis against your Web site which can be easily integrated with the Windows Task Scheduler or your own scripts or build system to easily allow for continuous integration.
Once the report is saved locally it can then be opened using IIS Manager and continue further analysis as with any other report. This sample console application can be scheduled using the Windows Task Scheduler so that it can run every night or at any time. Note that you could also write a few lines of PowerShell to automate it without the need of writing C# code and do that by only command line, but that is left for another post.
In the URL Rewrite forum somebody posted the question "are redirects bad for search engine optimization?". The answer is: not necessarily, Redirects are an important tool for Web sites and if used in the right context they actually are a required tool. But first a bit of background.
A redirect in simple terms is a way for the server to indicate to a client (typically a browser) that a resource has moved and they do this by the use of an HTTP status code and a HTTP location header. There are different types of redirects but the most common ones used are:
Below is an example on the response sent from the server when requesting http://www.microsoft.com/SQL/
One of the most important factors in SEO is the concept called organic linking, in simple words it means that your page gets extra points for every link that external Web sites have linking to your page. So now imagine the Search Engine Bot is crawling an external Web site and finds a link pointing to your page (example.com/some-page) and when it tries to visit your page it runs into a redirect to another location (say example.com/somepage). Now the Search Engine has to decide if it should add the original "some-page" into its index as well as if it should "add the extra points" to the new location or to the original location, or if it should just ignore it entirely. Well the answer is not that simple, but a simplification of it could be:
IIS Search Optimization Toolkit has a couple of rules that look for different patterns related to Redirects. The Beta version includes the following:
So how does it look like? In the image below I ran Site Analysis against a Web site and it found a few of these violations (2 and 3).
Notice that when you double click the violations it will tell you the details as well as give you direct access to the related URL's so that you can look at the content and all the relevant information about them to make the decision. From that menu you can also look at which other pages are linking to the different pages involved as well as launch it in the browser if needed.
Similarly with all the other violations it tries to explain the reason it is being flagged as well as recommended actions to follow for each of them.
IIS Search Engine Optimization Toolkit can also help you find all the different types of redirects and the locations where they are being used in a very easy way, just select Content->Status Code Summary in the Dashboard view and you will see all the different HTTP Status codes received from your Web site. Notice in the image below how you can see the number of redirects (in this case 18 temporary redirects and 2 permanent redirects). You can also see how much content they accounted for, in this case about 2.5 kb (Note that I've seen Web sites generate a large amount of useless content in redirect traffic, speaking of spending in bandwidth). You can double click any of those rows and it will show you the details of the URL's that returned that and from there you can see who links to them, etc.
So going back to the original question: "are redirects bad for Search Engine Optimization?". Not necessarily, they are an important tool used by Web application for many reasons such as:
Just make sure you don't abuse them by having redirects to redirects, unnecessary redirects, infinite loops, and use the right semantics.
IIS 7.0 Failed Request Tracing (for historical reasons internally we refer to it as FREB, since it used to be called Failed Request Event Buffering, and there are no "good-sounding-decent" acronyms for the new name) is probably the best diagnosing tool that IIS has ever had (that doesn't require Debugging skills), in a simplistic way it exposes all the interesting events that happen during the request processing in a way that allows you to really understand what went wrong with any request. To learn more you can go to http://learn.iis.net/page.aspx/266/troubleshooting-failed-requests-using-tracing-in-iis7/.
What is not immediately obvious is that you can use this tracing capabilities from your ASP.NET applications to output the tracing information in our infrastructure so that your users get a holistic view of the request.
When you are developing in ASP.NET there are typically two Tracing infrastructures you are likely to use, the ASP.NET Page Tracing and the System.Diagnostics Tracing. In recent versions they have been better integrated (attribute writeToDiagnosticsTrace) but still you want to know about both of them.
Today I'll just focus on logging ASP.NET Tracing to FREB, and in a future post I will show how to do it for System.Diagnostics Tracing.
To send the ASP.NET Tracing to FREB you just need to enable ASP.NET tracing, use the ASPNET trace provider and you will get those entries in the FREB log. The following web.config will enable FREB and ASP.NET Tracing. (Note that you need to go to the Default Web Site and Enable Failed Request Filtering so that this rules get executed)
Now if you have a sample page like the following:
The result is that in \inetpub\logs\FailedReqLogsFiles\ you will get an XML file that includes all the details of the request including the Page Traces from ASP.NET. Note that we provide an XSLT transformation that parses the Xml file and provides a friendly view of it where it shows different views of the trace file. For example below only the warning is shown in the Request Summary view:
There is also a Request Details view where you can filter by all the ASP.NET Page Traces that includes both of the traces we added in the Page code.
A lot of sites today have the ability for users to sign in to show them some sort of personalized content, whether its a forum, a news reader, or some e-commerce application. To simplify their users life they usually want to give them the ability to log on from any page of the Site they are currently looking at. Similarly, in an effort to keep a simple navigation for users Web Sites usually generate dynamic links to have a way to go back to the page where they were before visiting the login page, something like: <a href="/login?returnUrl=/currentUrl">Sign in</a>.
If your site has a login page you should definitely consider adding it to the Robots Exclusion list since that is a good example of the things you do not want a search engine crawler to spend their time on. Remember you have a limited amount of time and you really want them to focus on what is important in your site.
Out of curiosity I searched for login.php and login.aspx and found over 14 million login pages… that is a lot of useless content in a search engine.
Another big reason is because having this kind of URL's that vary depending on each page means there will be hundreds of variations that crawlers will need to follow, like /login?returnUrl=page1.htm, /login?returnUrl=page2.htm, etc, so it basically means you just increased the work for the crawler by two-fold. And even worst, in some cases if you are not careful you can easily cause an infinite loop for them when you add the same "login-link" in the actual login page since you get /login?returnUrl=login as the link and then when you click that you get /login?returnUrl=login?returnUrl=login... and so on with an ever changing URL for each page on your site. Note that this is not hypothetical this is actually a real example from a few famous Web sites (which I will not disclose). Of course crawlers will not infinitely crawl your Web site and they are not that silly and will stop after looking at the same resource /login for a few hundred times, but this means you are just reducing the time of them looking at what really matters to your users.
If you use the IIS SEO Toolkit it will detect the condition when the same resource (like login.aspx) is being used too many times (and only varying the Query String) and will give you a violation error like: Resource is used too many times.
There are a few fixes, but by far the best thing to do is just add the login page to the Robots Exclusion protocol.
To summarize always add the login page to the robots exclusion protocol file, otherwise you will end up:
This is the second note of the series:
1: Moving a SitemapPath Control to ASP.NET Web Pages
My current Web Site was built using ASP.NET 2.0 and WebForms, that means that all of my pages have the extension .aspx. While moving each page to use ASP.NET Web Pages their extension is being changed to .cshtml, and while I’m sure I could configure it in a way to get them to keep their aspx extensions it is a good opportunity to “start clean”. Furthermore, in ASP.NET WebPages you can also access them without the extension at all, so if you have /my-page.cshtml, you can also get to it using just /my-page. Given I will go through this migration I decided to use the clean URL format (no extension) and in the process get better URLs for SEO purposes, for example, today one of the URLs look like http://www.carlosag.net/Articles/configureComPlus.aspx but this would be a good time to enforce lower-case semantics and also get rid of those ugly camel casing and get a much more standard a friendly format for Search Engines using “-“, like: http://www.carlosag.net/articles/configure-com-plus.aspx.
The risk of course is that if you just change the URLs of your site you will end up not only with lots of 404’s (Not Found), but your page ranking will be reset and you will loose all the “juice” that external links and history have provided to it. The right way to do this is to make sure that you perform a permanent redirect (301) from the old URL to the new URL, this way Search Engines (and browsers) will know that the content has permanently moved to a new location so they should “pass all the page ranking” to the new page.
There are many ways to achieve this, but I happen to like URL Rewrite a lot, so I decided to use it. To do that I basically created one rule that uses a Rewrite Map (think of it as a Dictionary) to match the URL and if it matches it will perform a permanent redirect to the new one. So for example, if /aboutme.aspx is requested, then it will 301 to /about-me:
Note that I could have also created a simple rule that would change the extension to cshtml, however I decided that I also wanted to change the page names. The best thing is that you can do it incrementally and only rewrite them once your new page is ready or even switch back to the old one later if any problems occur.
Using URL Rewrite you can easily keep your SEO and pages without broken links. You can also achieve lots more, check out: SEO made easy with IIS URL Rewrite 2.0 SEO templates – CarlosAg
During the holidays my wife and I went back to visit our families in Mexico City where we are originally from. Again, during the flights I had enough spare time to build a couple of my favorite games, Backgammon and Connect4.
I've already built both games for Windows using Visual Basic 5 almost 11 years ago but as you would imagine I was far from feeling proud of the implementation. So this time I started from scratch and ended up with what I think are better versions of them (still not the best code, but pretty decent for just a few hours of coding). In fact the AI for the Backgammon version is a bit better and the Connect4 is faster and more suited for a Mobile device.
You can go with your PDA/Smartphone to http://www.carlosag.net/mobile/ to install both games or just click the images below to take you to the install page of each of them. Enjoy and feel free to add any feedback/features as comments to this blog post.
The one thing I learned during the development of these versions is that you do want to download the Windows Mobile 6 SDK if you are going to target that version (which is what my cell phone has), since it will add new Visual Studio 2005 Project Templates and new Emulator images which will help you a lot. For example I was trying to use buttons in my forms, and testing it in Pocket PC worked, but as soon as I tried them in my cell phone it crashed with a NotSupportedException. When I installed the SDK and switched to target that platform, Visual Studio immediately warned me that my platform didn't supported buttons which was great.
Bottom line I'm more and more amazed of how easy it is to build games in Windows Mobile and the things you can achieve with both Windows Mobile and the .NET Compact Framework.
A couple of years ago a friend of mine introduced me to a game called Sudoku, and immediately I loved it. As any good game its rules are very simple, basically you have to lay out the numbers from 1 to 9 horizontally in a row without repeating them, while at the same time you have to layout the same 1 to 9 numbers vertically in a column, and also within a group (a 3x3 square).
After that, every time I had to take a flight I got addicted to buying a new puzzles magazine that would entertain me for the flight. On December 2006 while flying to Mexico I decided to change the tradition and instead build a simple Sudoku game that I could play any time I felt like doing it without having to find a magazine store and that turned into this simple game. It is not yet a great game since I haven't had time to finalize it, but I figure I would share it anyway in case someone finds it fun.
Click Here to go to the Download Page
By default in Windows Server 2008 when you are using the Web Management Service (WMSVC) and Web Deploy (also known as MSDeploy) it will use Basic authentication to perform your deployments. If you want to enable Windows Authentication you will need to set a registry key so that the Web Management Service also supports using NTLM. To do this, update the registry on the server by adding a DWORD key named "WindowsAuthenticationEnabled" under HKEY_LOCAL_MACHINE\Software\Microsoft\WebManagement\Server, and set it to 1. If the Web Management Service is already started, the setting will take effect after the service is restarted.
For more details on other configuration options see:
This is the third post on the series:
2: Use URL Rewrite to maintain your Page rankings (SEO)
ASP.NET has a nice feature to help for deployment processes where you can drop an HTML file named app_offline.htm and it will unload all assemblies and code that it has loaded letting you easily delete binaries and deploy the new version while still serving back to customers the friendly message that you provide telling them that your site is under maintenance.
One caveat though, is that Internet Explorer users might still see the “friendly” error that they display and not your nice message. This happens because of a page size validation that IE performs. See Scott’s blog on how to workaround that problem: App_Offline.htm and working around the IE Friendly Errors
Note: The live site is now running in .NET 4.0 and all using Razor.
One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).
So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?
One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.
You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.
To run it you just need to:
You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).
And when double clicking any of those results you get the details as reported by the W3 Validation Service:
The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.
The code looks like this:
I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.
Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.
Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.