A couple of months ago I wrote about using LINQ with Microsoft.Web.Administration to manage and query IIS 7.0 configuration. Somebody came back to me and said that LINQ was very cool but that it was very much Developer oriented and that in a production server without VS or .NET 3.5 it wouldn't be an option. Indeed that is a very valid comment and so I decided to show similar stuff with a tool that is available in Windows and its more IT oriented, Windows PowerShell.
So in this blog I will quickly mention some of the things you can easily do with Microsoft.Web.Administration inside Windows PowerShell.
To start working with Microsoft.Web.Administration the first thing you need to do is load the assembly so that you can start using it. It is quite easy using the methods from the Assembly type.
Once you have the assembly available then you will need to create an instance of our ServerManager class that gives you access to the entire configuration system.
The above line basically declares a variable called $iis that we will be able to use for all of our configuration tasks.
Now to more interesting stuff.
Getting the list of Sites
Getting the list of sites is as easy as just accessing the Sites collection, this will output all the information about sites
However, we can also specify the information we care and the format we want to use, for example:
You can also use the where-object command to filter objects to get only the sites that are Stopped, and then we want to Start them.
OK, now let's imagine I want to find all the applications that are configured to run in the Default ApplicationPool and move them to run in my NewAppPool. This is better to do it in three lines:
Now let's say I want to find the top 20 distinct URL's of all the requests running in all my worker processes that has taken more than 1 second.
OK, finally let's say I want to display a table of all the applications running under DefaultAppPool and display if Anonymous authentication is enabled or not. (Now this one is almost on the edge of "you should do it differently, but it is Ok if you are only reading a single value from the section):
Again, the interesting thing is that now you can access all the functionality from M.W.A. from Windows PowerShell very easily without the need of compiling code or anything else. It does take some time to get used to the syntax, but once you do it you can do very fancy stuff.
Last week we released a refresh for the IIS Search Engine Optimization (SEO) Toolkit v1.0. This version is a minor update that includes fixes for all the important bugs reported in the IIS.NET SEO Forum.
Some of the fixes included in this version are:
This release is compatible with v1.0 RTM and it will upgrade if already installed. So go ahead and install the new version using Web Platform Installed by clicking: http://go.microsoft.com/?linkid=9695987
Learn more about it at: http://www.iis.net/expand/SEOToolkit
One thing that I’ve been asked several times about the SEO Toolkit is if it does a full standards validation on the markup and content that is processed, and if not, to add support for more comprehensive standards validation, in particular XHTML and HTML 4.01. Currently the markup validation performed by the SEO Toolkit is really simple, its main goal is to make sure that the markup is correctly organized, for example that things like <b><i>Test</b></i> are not found in the markup, the primary reason is to make sure that basic blocks of markup are generally "easy" to parse by Search Engines and that the semantics will not be terribly broken if a link, text or style is not correctly closed (since all of them would affect SEO).
So the first thing I would say is that we have heard the feedback and are looking at what we could possibly add in future versions, however why wait, right?
One thing that many people do not realize is that the SEO Toolkit can be extended to add new violations, new metadata and new rules to the analysis process and as such during a demo I gave a few weeks ago I decided to write a sample on how to consume the online W3C Markup Validation Service from the SEO Toolkit.
You can download the SEOW3Validator including the source code at http://www.carlosag.net/downloads/SEOW3Validator.zip.
To run it you just need to:
You should be able to now run the SEO Toolkit just as before but now you will find new violations, for example in my site I get the ones below. Notice that there are a new set of violations like W3 Validator – 68, etc, and all of them belong to the W3C category. (I would have liked to have better names, but the way the W3 API works is not really friendly for making this any better).
And when double clicking any of those results you get the details as reported by the W3 Validation Service:
The code is actually pretty simple, the main class is called SEOW3ValidatorExtension that derives from CrawlerModule and overrides the Process method to call the W3C Validation service sending the actual markup in the request, this means that it does not matter if your site is an Intranet or in the Internet, it will work; and for every warning and error that is returned by the Validator it will add a new violation to the SEO report.
The code looks like this:
I created a helper class W3Validator that basically encapsulates the consumption of the W3C Validation Service, the code is far from what I would like it to be however there are some "interesting" decisions on the way the API is exposed, I would have probably designed the service differently and not return the results formatted in HTML when this is actually an API/WebService that can be presented somewhere else than a browser. So a lot of the code is to just re-format the results to look "decent", but to be honest I did not want to spend too much time on it so everything was put together quite quickly. Also, if you look at the names I used for violations, I did not want to hard-code specific Message IDs and since the Error Message was different for all of them even within the same Message ID, it was not easy to provide better messages. Anyway, overall it is pretty usable and should be a good way to do W3 Validation.
Note that one of the cool things you get for free is that since these are stored as violations, you can then re-run the report and use the Compare Report feature to see the progress while fixing them. Also, since they are stored as part of the report you will not need to keep running the validator over and over again but instead just open it and continue looking at them, as well as analyzing the data in the Reports and Queries, and be able to export them to Excel, etc.
Hopefully this will give you a good example on some of the interesting things you can achieve with the SEO Toolkit and its extensibility.
A couple of years ago a friend of mine introduced me to a game called Sudoku, and immediately I loved it. As any good game its rules are very simple, basically you have to lay out the numbers from 1 to 9 horizontally in a row without repeating them, while at the same time you have to layout the same 1 to 9 numbers vertically in a column, and also within a group (a 3x3 square).
After that, every time I had to take a flight I got addicted to buying a new puzzles magazine that would entertain me for the flight. On December 2006 while flying to Mexico I decided to change the tradition and instead build a simple Sudoku game that I could play any time I felt like doing it without having to find a magazine store and that turned into this simple game. It is not yet a great game since I haven't had time to finalize it, but I figure I would share it anyway in case someone finds it fun.
Click Here to go to the Download Page
Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?
Well, the answer is not necessarily.
I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:
Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.
Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.
Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.
In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":
In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.
Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.
The other day somebody ask me if there was a way to limit the amount of work that Site Analysis in IIS SEO Toolkit would cause to the server. This is interesting for a couple of reasons,
In Beta 1 we do not support the Crawl-delay directive in the Robots exclusion protocol; in future versions we will look at adding support this setting. The good news is that in Beta 1 we do have a configurable setting that can help you achieve this goals called Maximum Number of Concurrent Requests that you can configure.
To set it:
The other day a friend of mine who owns a Web site asked me to look at his Web site to see if I could spot anything weird since according to his Web Hosting provider it was being flagged as malware infected by Google.
My friend (who is not technical at all) talked to his Web site designer and mentioned the problem. He downloaded the HTML pages and tried looking for anything suspicious on them, however he was not able to find anything. My friend then went back to his Hosting provider and mentioned the fact that they were not able to find anything problematic and that if it could be something with the server configuration, to which they replied in a sarcastic way that it was probably ignorance on his Web site designer.
So of course I decided the first thing I would do is to start by crawling the Web site using Site Analysis in IIS SEO Toolkit. This gave me a list of the pages and resources that his Web site would have. First thing I knew is usually malware hides either in executables or scripts on the server, so I started looking for the different content types shown in the "Content Types Summary" inside the Content reports in the dashboard page.
After running the query as shown above, I got a set of HTML files which all gave a status code 404 – NOT FOUND. Double clicking in any of them and looking at the HTML markup content made it immediately obvious they were malware infected, look at the following markup:
Notice those two ugly scripts that seem to be just a random set of numbers, quotes and letters? I do not believe I've ever met a developer that writes code like that in real web applications.
Notice how both of them end up writing the actual malware script living in martuz.cn and gumblar.cn.
Now, this clearly means they are infected with malware, and it clearly seems that the problem is not in the Web Application but the infection is in the Error Pages that are being served from the Server when an error happens. Next step to be able to guide them with more specifics I needed to determine the Web server that they were using, to do that it is as easy as just inspecting the headers in the IIS SEO Toolkit which displayed something like the ones shown below:
With a big disclaimer that I know nothing about Apache, I then guided them to their .htaccess file and the httpd.conf file for ErrorDocument and that would show them which files were infected and if it was a problem in their application or the server.
Turns out that after they went back to their Hoster with all this evidence, they finally realized that their server was infected and were able to clean up the malware. IIS SEO Toolkit helped me quickly identify this based on the fact that is able to see the Web site with the same eyes as a Search Engine would, following every link and letting me perform easy queries to find information about it. In future versions of IIS SEO Toolkit you can expect to be able to find this kind of things in a lot simpler ways, but for Beta 1 for those who cares here is the query that you can save in an XML file and use "Open Query" to see if you are infected with these malware.
In the new version of the IIS SEO Toolkit we added two new reports that are very interesting, both from an SEO perspective as well as from user experience and site organization. These reports are located in the Links category of the reports
This report shows a summary of all the redirects that were found while crawling the Web site. The first column (Linking-URL) is the URL that was visited that resulted in redirection to the Linked-URL (second column). The third column (Linking-Status code) specifies what type of redirection happened based on the HTTP status code enumeration. The most common values will be MovedPermanently/Moved which is a 301, or Found/Redirect which is a 302. The last column shows the status code for the final URL so you can easily identify redirects that failed or that redirected to another redirect.
This report is interesting because Redirects might affect your Search Engine rankings and make your users have the perception that your site is slower. For more information on Redirects see: Redirects, 301, 302 and IIS SEO Toolkit
This is probably one of my favorite reports since it is almost impossible to find this type of information in any other 'easy' way.
The report basically tells you how hard it is for users that land in your home page to get to any of the pages in your site. For example in the image below it shows that it takes 5 clicks for a user to get from the home page of my site to the XGrid.htc component.
This is very valuable information because you will be able to understand how deep your Web site is, in my case if you were to walk the entire site and layout its structure in a hierarchical diagram it would basically be 5 levels deep. Remember, you want your site to be shallow so that its easily discoverable and crawled by Search Engines.
Even more interesting you can double click any of the results and see the list of clicks that the user has to make it to get to the page.
Note that it shows the URL, the Title of the page as well as the Text of the Link you need to click to get to the Next URL (the one with a smaller index). So as you can see in my case the user needs to go to the home page, click the link with text "XGrid", which takes it to the /XGrid/ url (index 3) which then needs to click the link with text "This is a new...", etc.
Note that as you select the URLs in the list it will highlight in the markup the link that takes you to the next URL.
The data of this report is powered by a new type of query we called Route Query. The reason this is interesting is because you can customize the report to add different filters, or change the start URL, or more.
For example, lets say I want to figure out all the pages that the user can get to when they land in my site in a specific page, say http://www.carlosag.net/Tools/XGrid/editsample.htm:
In the Dashboard view of a Report, select the option 'Query->New Routes Query'. This will open a new Query tab where you can specify the Start URL that you are interested.
As you can see this report clearly shows that if a user visits my site and lands on this page they will basically be blocked and only be able to see 8 pages of the entire site. This is a clear example on where a link to the Home page would be beneficial.
Other common scenarios that this query infrastructure could be used for is to find ways to direct traffic from your most common pages to your conversion pages, this report will let you figure out how difficult or easy it is to get from any page to your conversion pages
Today I read a question in one of the IIS.NET forums - although I'm not sure if this is what they really wanted to know - I figured it might be useful to understand how to do this anyway. Several times users does not like exposing their ASP.NET pages using the default .aspx file extension (sometimes because of legacy reasons, where they try to minimize the risk of generating broken links when moving from a different technology, to preserve the validity of previous search-engines-indexes and sometimes for the false sense of security or whatever).
Regardless of why, the bottom line, to map a different file extension so they behave just like any other ASP.NET page requires you to add a couple of entries in configuration, especially if you want those to be able to work in both Pipeline Modes "Classic and Integrated".
For this exercise lets assume you want to assign the file extension .IIS so that they get processed as ASPX pages and that you only want this to be applicable for Default Web Site and its applications.
Lets actually describe the AppCmd.exe lines since it breaks nicely the different operations.
Hopefully this helps understanding a bit how to re-map extensions to ASP.NET extensions, and in doing that learn a bit more about preConditions, Handlers and AppCmd.
Today I was going to post about extending the IIS Configuration, in particular about a feature that not everybody knows that allows you to extend the IIS Configuration System using dynamic code. What this means is that instead of hard-coding the configuration using XML in a .config file, your configuration can be provided by a COM object that implements IAppHostPropertyExtension, IAppHostElementExtension and IAppHostMethodExtension.
Then, just to make sure I was not repeating what somebody else already said I searched for this in live.com (Worth to say, excellent results, first hit is the documentation of the interface, second hit is an excellent article in iis.net).
So instead of repeating what you can already find in those places in IIS.NET I decided to not blog about it in details, but instead mention some of the things that are not specified in these places.
This dynamic configuration is great and offers lots of interesting features since it allows you to expose any random code that can immediately be accessed through all of our configuration API's, including Microsoft.Web.Administration, AHADMIN, etc, giving your end-user a common programming paradigm, in fact this also means that its immediately accessible to the UI API's and even to the new Configuration Editor in the Admin Pack.
Another interesting benefit is that through these API's your code can be called remotely so that it can be scripted to manage the machines remotely without the need to write any serialization or complex remote infrastructure (restrictions might apply).
However, one thing that is also important to mention is that these dynamic configuration extensions are only available for administration tools, meaning you cannot access this extensions from the worker process by default. To clarify, you cannot use the Worker Process configuration instance to invoke these extensions since the worker process specifically disables the ability to call them in its configuration instance. However, if you create your own instance of Microsoft.Web.Administration.ServerManager (which requires you to be running in Full Trust) you will be able. You can also create your own instance of Microsoft.ApplicationHost.AdminManager and you will be able to access them. However in both cases this will only work if your an Administrator in the machine or have read ACL's for ApplicationHost.config file (which by default is only readable by Administrators). This is why methods like Microsoft.Web.Administration.WebConfiigrationManager::GetSection (and CoGetObject for AHADMIN) are provided so you don't run into these issues when developing Web Applications and are still able to read configuration sections for your worker process without requiring administrative privileges (in MWA provided you are either are in Full Trust or the section definition marks it as requirePermission=false).
To understand better some scenarios its worth to mention that In IIS 7.0 we actually use these API's to provide access to runtime information in an easy way and other tasks, for example, to query the state of a Site, to Recycle an Application Pool, to assign an SSL certificate to a binding, to stop a Site, are all provided through this mechanism. If you want to see all the things we do this way just open %windir%\System32\Inetsrv\config\schema\rscaext.xml where all of our Web Server extensions are declared. Our own FTP Server for IIS 7.0 uses the same mechanism for things like querying Sessions, and other cool stuff.
Anyway, feel free to give the IIS.NET article a good read, its quite good.