One question that I've been asked several times is: "Is it possible to schedule the IIS SEO Toolkit to run automatically every night?". Other related questions are: "Can I automate the SEO Toolkit so that as part of my build process I'm able to catch regressions on my application?", or "Can I run it automatically after every check-in to my source control system to ensure no links are broken?", etc.
The good news is that the answer is YES!. The bad news is that you have to write a bit of code to be able to make it work. Basically the SEO Toolkit includes a Managed code API to be able to start the analysis just like the User Interface does, and you can call it from any application you want using Managed Code.
In this blog I will show you how to write a simple command application that will start a new analysis against the site provided in the command line argument and process a few queries after finishing.
The most important type included is a class called WebCrawler. This class takes care of all the process of driving the analysis. The following image shows this class and some of the related classes that you will need to use for this.
The WebCrawler class is initialized through the configuration specified in the CrawlerSettings. The WebCrawler class also contains two methods Start() and Stop() which starts the crawling process in a set of background threads. With the WebCrawler class you can also gain access to the CrawlerReport through the Report property. The CrawlerReport class represents the results (whether completed or in progress) of the crawling process. It has a method called GetUrls() that returns an instance to all the UrlInfo items. A UrlInfo is the most important class that represents a URL that has been downloaded and processed, it has all the metadata such as Title, Description, ContentLength, ContentType, and the set of Violations and Links that it includes.
If you are not using Visual Studio, you can just save the contents above in a file, call it SEORunner.cs and compile it using the command line:
C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe /r:"c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll" /optimize+ SEORunner.cs
After that you should be able to run SEORunner.exe and pass the URL of your site as a argument, you will see an output like:
Processed - Remaining - Download Size
56 - 149 - 0.93 MB
127 - 160 - 2.26 MB
185 - 108 - 3.24 MB
228 - 72 - 4.16 MB
254 - 48 - 4.98 MB
277 - 36 - 5.36 MB
295 - 52 - 6.57 MB
323 - 25 - 7.53 MB
340 - 9 - 8.05 MB
358 - 1 - 8.62 MB
362 - 0 - 8.81 MB
Start URL: http://www.carlosag.net/
Start Time: 11/16/2009 12:16:04 AM
End Time: 11/16/2009 12:16:15 AM
Status Code summary
OK - 319
MovedPermanently - 17
Found - 23
NotFound - 2
InternalServerError - 1
The most interesting method above is RunAnalysis, it creates a new instance of the CrawlerSettings and specifies the start URL. Note that it also specifies that we should consider internal all the pages that are hosted in the same directory or subdirectories. We also set the a unique name for the report and use the same directory as the IIS SEO UI uses so that opening IIS Manager will show the reports just as if they were generated by it. Then we finally call Start() which will start the number of worker threads specified in the WebCrawler::WorkerCount property. We finally just wait for the WebCrawler to be done by querying the IsRunning property.
The remaining methods just leverage LINQ to perform a few queries to output things like a report aggregating all the URLs processed by Status code and more.
As you can see the IIS SEO Toolkit crawling APIs allow you to easily write your own application to start the analysis against your Web site which can be easily integrated with the Windows Task Scheduler or your own scripts or build system to easily allow for continuous integration.
Once the report is saved locally it can then be opened using IIS Manager and continue further analysis as with any other report. This sample console application can be scheduled using the Windows Task Scheduler so that it can run every night or at any time. Note that you could also write a few lines of PowerShell to automate it without the need of writing C# code and do that by only command line, but that is left for another post.
In the URL Rewrite forum somebody posted the question "are redirects bad for search engine optimization?". The answer is: not necessarily, Redirects are an important tool for Web sites and if used in the right context they actually are a required tool. But first a bit of background.
A redirect in simple terms is a way for the server to indicate to a client (typically a browser) that a resource has moved and they do this by the use of an HTTP status code and a HTTP location header. There are different types of redirects but the most common ones used are:
Below is an example on the response sent from the server when requesting http://www.microsoft.com/SQL/
One of the most important factors in SEO is the concept called organic linking, in simple words it means that your page gets extra points for every link that external Web sites have linking to your page. So now imagine the Search Engine Bot is crawling an external Web site and finds a link pointing to your page (example.com/some-page) and when it tries to visit your page it runs into a redirect to another location (say example.com/somepage). Now the Search Engine has to decide if it should add the original "some-page" into its index as well as if it should "add the extra points" to the new location or to the original location, or if it should just ignore it entirely. Well the answer is not that simple, but a simplification of it could be:
IIS Search Optimization Toolkit has a couple of rules that look for different patterns related to Redirects. The Beta version includes the following:
So how does it look like? In the image below I ran Site Analysis against a Web site and it found a few of these violations (2 and 3).
Notice that when you double click the violations it will tell you the details as well as give you direct access to the related URL's so that you can look at the content and all the relevant information about them to make the decision. From that menu you can also look at which other pages are linking to the different pages involved as well as launch it in the browser if needed.
Similarly with all the other violations it tries to explain the reason it is being flagged as well as recommended actions to follow for each of them.
IIS Search Engine Optimization Toolkit can also help you find all the different types of redirects and the locations where they are being used in a very easy way, just select Content->Status Code Summary in the Dashboard view and you will see all the different HTTP Status codes received from your Web site. Notice in the image below how you can see the number of redirects (in this case 18 temporary redirects and 2 permanent redirects). You can also see how much content they accounted for, in this case about 2.5 kb (Note that I've seen Web sites generate a large amount of useless content in redirect traffic, speaking of spending in bandwidth). You can double click any of those rows and it will show you the details of the URL's that returned that and from there you can see who links to them, etc.
So going back to the original question: "are redirects bad for Search Engine Optimization?". Not necessarily, they are an important tool used by Web application for many reasons such as:
Just make sure you don't abuse them by having redirects to redirects, unnecessary redirects, infinite loops, and use the right semantics.
IIS 7.0 Failed Request Tracing (for historical reasons internally we refer to it as FREB, since it used to be called Failed Request Event Buffering, and there are no "good-sounding-decent" acronyms for the new name) is probably the best diagnosing tool that IIS has ever had (that doesn't require Debugging skills), in a simplistic way it exposes all the interesting events that happen during the request processing in a way that allows you to really understand what went wrong with any request. To learn more you can go to http://learn.iis.net/page.aspx/266/troubleshooting-failed-requests-using-tracing-in-iis7/.
What is not immediately obvious is that you can use this tracing capabilities from your ASP.NET applications to output the tracing information in our infrastructure so that your users get a holistic view of the request.
When you are developing in ASP.NET there are typically two Tracing infrastructures you are likely to use, the ASP.NET Page Tracing and the System.Diagnostics Tracing. In recent versions they have been better integrated (attribute writeToDiagnosticsTrace) but still you want to know about both of them.
Today I'll just focus on logging ASP.NET Tracing to FREB, and in a future post I will show how to do it for System.Diagnostics Tracing.
To send the ASP.NET Tracing to FREB you just need to enable ASP.NET tracing, use the ASPNET trace provider and you will get those entries in the FREB log. The following web.config will enable FREB and ASP.NET Tracing. (Note that you need to go to the Default Web Site and Enable Failed Request Filtering so that this rules get executed)
Now if you have a sample page like the following:
The result is that in \inetpub\logs\FailedReqLogsFiles\ you will get an XML file that includes all the details of the request including the Page Traces from ASP.NET. Note that we provide an XSLT transformation that parses the Xml file and provides a friendly view of it where it shows different views of the trace file. For example below only the warning is shown in the Request Summary view:
There is also a Request Details view where you can filter by all the ASP.NET Page Traces that includes both of the traces we added in the Page code.
A lot of sites today have the ability for users to sign in to show them some sort of personalized content, whether its a forum, a news reader, or some e-commerce application. To simplify their users life they usually want to give them the ability to log on from any page of the Site they are currently looking at. Similarly, in an effort to keep a simple navigation for users Web Sites usually generate dynamic links to have a way to go back to the page where they were before visiting the login page, something like: <a href="/login?returnUrl=/currentUrl">Sign in</a>.
If your site has a login page you should definitely consider adding it to the Robots Exclusion list since that is a good example of the things you do not want a search engine crawler to spend their time on. Remember you have a limited amount of time and you really want them to focus on what is important in your site.
Out of curiosity I searched for login.php and login.aspx and found over 14 million login pages… that is a lot of useless content in a search engine.
Another big reason is because having this kind of URL's that vary depending on each page means there will be hundreds of variations that crawlers will need to follow, like /login?returnUrl=page1.htm, /login?returnUrl=page2.htm, etc, so it basically means you just increased the work for the crawler by two-fold. And even worst, in some cases if you are not careful you can easily cause an infinite loop for them when you add the same "login-link" in the actual login page since you get /login?returnUrl=login as the link and then when you click that you get /login?returnUrl=login?returnUrl=login... and so on with an ever changing URL for each page on your site. Note that this is not hypothetical this is actually a real example from a few famous Web sites (which I will not disclose). Of course crawlers will not infinitely crawl your Web site and they are not that silly and will stop after looking at the same resource /login for a few hundred times, but this means you are just reducing the time of them looking at what really matters to your users.
If you use the IIS SEO Toolkit it will detect the condition when the same resource (like login.aspx) is being used too many times (and only varying the Query String) and will give you a violation error like: Resource is used too many times.
There are a few fixes, but by far the best thing to do is just add the login page to the Robots Exclusion protocol.
To summarize always add the login page to the robots exclusion protocol file, otherwise you will end up:
During the holidays my wife and I went back to visit our families in Mexico City where we are originally from. Again, during the flights I had enough spare time to build a couple of my favorite games, Backgammon and Connect4.
I've already built both games for Windows using Visual Basic 5 almost 11 years ago but as you would imagine I was far from feeling proud of the implementation. So this time I started from scratch and ended up with what I think are better versions of them (still not the best code, but pretty decent for just a few hours of coding). In fact the AI for the Backgammon version is a bit better and the Connect4 is faster and more suited for a Mobile device.
You can go with your PDA/Smartphone to http://www.carlosag.net/mobile/ to install both games or just click the images below to take you to the install page of each of them. Enjoy and feel free to add any feedback/features as comments to this blog post.
The one thing I learned during the development of these versions is that you do want to download the Windows Mobile 6 SDK if you are going to target that version (which is what my cell phone has), since it will add new Visual Studio 2005 Project Templates and new Emulator images which will help you a lot. For example I was trying to use buttons in my forms, and testing it in Pocket PC worked, but as soon as I tried them in my cell phone it crashed with a NotSupportedException. When I installed the SDK and switched to target that platform, Visual Studio immediately warned me that my platform didn't supported buttons which was great.
Bottom line I'm more and more amazed of how easy it is to build games in Windows Mobile and the things you can achieve with both Windows Mobile and the .NET Compact Framework.
A couple of years ago a friend of mine introduced me to a game called Sudoku, and immediately I loved it. As any good game its rules are very simple, basically you have to lay out the numbers from 1 to 9 horizontally in a row without repeating them, while at the same time you have to layout the same 1 to 9 numbers vertically in a column, and also within a group (a 3x3 square).
After that, every time I had to take a flight I got addicted to buying a new puzzles magazine that would entertain me for the flight. On December 2006 while flying to Mexico I decided to change the tradition and instead build a simple Sudoku game that I could play any time I felt like doing it without having to find a magazine store and that turned into this simple game. It is not yet a great game since I haven't had time to finalize it, but I figure I would share it anyway in case someone finds it fun.
Click Here to go to the Download Page
This is the third post on the series:
1: Moving a SitemapPath Control to ASP.NET Web Pages
2: Use URL Rewrite to maintain your Page rankings (SEO)
ASP.NET has a nice feature to help for deployment processes where you can drop an HTML file named app_offline.htm and it will unload all assemblies and code that it has loaded letting you easily delete binaries and deploy the new version while still serving back to customers the friendly message that you provide telling them that your site is under maintenance.
One caveat though, is that Internet Explorer users might still see the “friendly” error that they display and not your nice message. This happens because of a page size validation that IE performs. See Scott’s blog on how to workaround that problem: App_Offline.htm and working around the IE Friendly Errors
Note: The live site is now running in .NET 4.0 and all using Razor.
This is the second note of the series:
My current Web Site was built using ASP.NET 2.0 and WebForms, that means that all of my pages have the extension .aspx. While moving each page to use ASP.NET Web Pages their extension is being changed to .cshtml, and while I’m sure I could configure it in a way to get them to keep their aspx extensions it is a good opportunity to “start clean”. Furthermore, in ASP.NET WebPages you can also access them without the extension at all, so if you have /my-page.cshtml, you can also get to it using just /my-page. Given I will go through this migration I decided to use the clean URL format (no extension) and in the process get better URLs for SEO purposes, for example, today one of the URLs look like http://www.carlosag.net/Articles/configureComPlus.aspx but this would be a good time to enforce lower-case semantics and also get rid of those ugly camel casing and get a much more standard a friendly format for Search Engines using “-“, like: http://www.carlosag.net/articles/configure-com-plus.aspx.
The risk of course is that if you just change the URLs of your site you will end up not only with lots of 404’s (Not Found), but your page ranking will be reset and you will loose all the “juice” that external links and history have provided to it. The right way to do this is to make sure that you perform a permanent redirect (301) from the old URL to the new URL, this way Search Engines (and browsers) will know that the content has permanently moved to a new location so they should “pass all the page ranking” to the new page.
There are many ways to achieve this, but I happen to like URL Rewrite a lot, so I decided to use it. To do that I basically created one rule that uses a Rewrite Map (think of it as a Dictionary) to match the URL and if it matches it will perform a permanent redirect to the new one. So for example, if /aboutme.aspx is requested, then it will 301 to /about-me:
Note that I could have also created a simple rule that would change the extension to cshtml, however I decided that I also wanted to change the page names. The best thing is that you can do it incrementally and only rewrite them once your new page is ready or even switch back to the old one later if any problems occur.
Using URL Rewrite you can easily keep your SEO and pages without broken links. You can also achieve lots more, check out: SEO made easy with IIS URL Rewrite 2.0 SEO templates – CarlosAg
Today I was playing a bit with Visual Studio 2008 and was surprised to see that I was not getting IntelliSense in my web.config. As you might already know IntelliSense in Xml in Visual Studio is implemented by using a set of schemas that are stored in a folder inside the VS folder, something like: \Program Files\Microsoft Visual Studio 9.0\Xml\Schemas. After looking to the files it was easy to understand what was going on, turns out I was developing using .NET 2.0 settings and Visual Studio now ships different schemas for Web.config files depending on the settings that you are using: DotNetConfig.xsd, DotNetConfig20.xsd and DotNetConfig30.xsd.
As I imagine I looked into the DotNetConfig.xsd and it indeed has all the definitions for the system.webServer sections as well as the DotNetConfig30.xsd. However, DotNetConfig20.xsd does not include the section details, only its definition, so to fix your IntelliSense you can just open DotNetConfig.xsd, select the entire section from:
<xs:element name="system.webServer" vs:help="configuration/system.webServer">...</xs:element>
and just replace the entry in DotNetConfig20.xsd. You might also want to copy the system.applicationHost section and add it to the DotNetConfig20.xsd since it does not include it as well.
Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?
Well, the answer is not necessarily.
I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:
Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.
Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.
Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.
In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":
In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.
Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.