November, 2009

Posts
  • CarlosAg Blog

    IIS SEO Toolkit – Crawler Module Extensibility

    • 27 Comments

     

    Sample SEO Toolkit CrawlerModule Extensibility

    In this blog we are going to write an example on how to extend the SEO Toolkit functionality, so for that we are going to pretend our company has a large Web site that includes several images, and now we are interested in making sure all of them comply to a certain standard, lets say all of them should be smaller than 1024x768 pixels and that the quality of the images is no less than 16 bits per pixel. Additionally we would also like to be able to make custom queries that can later allow us to further analyze the contents of the images and filter based on directories and more.

    For this we will extend the SEO Toolkit crawling process to perform the additional processing for images, we will be adding the following new capabilities:

    1. Capture additional information from the Content. In this case we will capture information about the image, in particular we will extend the report to add a "Image Width", "Image Height" and a "Image Pixel Format".
    2. Flag additional violations. In this example we will flag three new violations:
      1. Image is too large. This violation will be flagged any time the content length of the image is larger than the "Maximum Download Size per URL" configured at the start of the analysis. It will also flag this violation if the resolution is larger than 1024x768.
      2. Image pixel format is too small. This violation will be flagged if the image is 8 or 4 bits per pixel.
      3. Image has a small resolution. This will be flagged if the image resolution per inch is less than 72dpi.

    Enter CrawlerModule

    A crawler module is a class that extends the crawling process in Site Analysis to provide custom functionality while processing each URL. By deriving from this class you can easily raise your own set of violations or add your own data and links to any URL.

    public abstract class CrawlerModule : IDisposable
    {
       
    // Methods
       
    public virtual void BeginAnalysis();
        public virtual void EndAnalysis(bool cancelled);
       
    public abstract void Process(CrawlerProcessContext context);

       
    // Properties
        protected WebCrawler Crawler { get; }
       
    protected CrawlerSettings Settings { get; }
    }

    It includes three main methods:

    1. BeginAnalysis. This method is invoked once at the beginning of the crawling process and allows you to perform any initialization needed. Common tasks include registering custom properties in the Report that can be accessed through the Crawler property.
    2. Process. This method is invoked for each URL once its contents has been downloaded. The context argument includes a property URLInfo that provides all the metadata extracted for the URL. It also includes a list of Violations and Links in the URL. Common tasks include augmenting the metadata of the URL whether using its contents or external systems, flagging new custom Violations, or discovering new links in the contents.
    3. EndAnalysis. This method is invoked once at the end of the crawling process and allows you to do any final calculations on the report once all the URLs have been processed. Common tasks in this method include performing aggregations of data across all the URLs, or identifying violations that depend on all the data being available (such as finding duplicates).

    Coding the Image Crawler Module

    Create a Class Library in Visual Studio and add the code shown below.

    1. Open Visual Studio and select the option File->New Project
    2. In the New Project dialog select the Class Library project template and specify a name and a location such as "SampleCrawlerModule"
    3. Using the Menu "Project->Add Reference", add a reference to the IIS SEO Toolkit client library (C:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll).
    4. Since we are going to be registering this through the IIS Manager extensibility, add a reference to the IIS Manager extensibility DLL (c:\windows\system32\inetsrv\Microsoft.Web.Management.dll) using the "Project->Add Reference" menu.
    5. Also, since we will be using the .NET Bitmap class you need to add a reference to "System.Drawing" using the "Project->Add Reference" menu.
    6. Delete the auto-generated Class1.cs since we will not be using it.
    7. Using the Menu "Project->Add New Item" Add a new class named "ImageExtension".
    using System;
    using System.Drawing;
    using System.Drawing.Imaging;
    using Microsoft.Web.Management.SEO.Crawler;

    namespace SampleCrawlerModule {

       
    /// <summary>
        /// Extension to add validation and metadata to images while crawling
        /// </summary>
        internal class ImageExtension : CrawlerModule {
           
    private const string ImageWidthField = "iWidth";
           
    private const string ImageHeightField = "iHeight";
           
    private const string ImagePixelFormatField = "iPixFmt";

           
    public override void BeginAnalysis() {
               
    // Register the properties we want to augment at the begining of the analysis
                Crawler.Report.RegisterProperty(ImageWidthField, "Image Width", typeof(int));
               
    Crawler.Report.RegisterProperty(ImageHeightField, "Image Height", typeof(int));
               
    Crawler.Report.RegisterProperty(ImagePixelFormatField, "Image Pixel Format", typeof(string));
           
    }

           
    public override void Process(CrawlerProcessContext context) {
               
    // Make sure only process the Content Types we need to
                switch (context.UrlInfo.ContentTypeNormalized) {
                   
    case "image/jpeg":
                   
    case "image/png":
                   
    case "image/gif":
                   
    case "image/bmp":
                       
    // Process only known content types
                        break;
                   
    default:
                       
    // Ignore any other
                        return;
               
    }

               
    //--------------------------------------------
                // If the content length of the image was larger than the max
                //   allowed to download, then flag a violation, and stop
                if (context.UrlInfo.ContentLength >
                   
    Crawler.Settings.MaxContentLength) {
                   
    Violations.AddImageTooLargeViolation(context,
                        "It is larger than the allowed download size"
    );
                   
    // Stop processing since we do not have all the content
                    return;
               
    }

               
    // Load the image from the response into a bitmap
                using (Bitmap bitmap = new Bitmap(context.UrlInfo.ResponseStream)) {
                   
    Size size = bitmap.Size;

                   
    //--------------------------------------------
                    // Augment the metadata by adding our fields
                    context.UrlInfo.SetPropertyValue(ImageWidthField, size.Width);
                   
    context.UrlInfo.SetPropertyValue(ImageHeightField, size.Height);
                   
    context.UrlInfo.SetPropertyValue(ImagePixelFormatField, bitmap.PixelFormat.ToString());

                   
    //--------------------------------------------
                    // Additional Violations:
                    //
                    // If the size is outside our standards, then flag violation
                    if (size.Width > 1024 &&
                       
    size.Height > 768) {
                       
    Violations.AddImageTooLargeViolation(context,
                           
    String.Format("The image size is: {0}x{1}",
                                         
    size.Width, size.Height));
                   
    }

                   
    // If the format is outside our standards, then flag violation
                    switch (bitmap.PixelFormat) {
                       
    case PixelFormat.Format1bppIndexed:
                       
    case PixelFormat.Format4bppIndexed:
                       
    case PixelFormat.Format8bppIndexed:
                           
    Violations.AddImagePixelFormatSmall(context);
                           
    break;
                   
    }

                   
    if (bitmap.VerticalResolution <= 72 ||
                       
    bitmap.HorizontalResolution <= 72) {
                       
    Violations.AddImageResolutionSmall(context,
                           
    bitmap.HorizontalResolution + "x" + bitmap.VerticalResolution);
                   
    }
               
    }
           
    }

           
    /// <summary>
            /// Helper class to hold the violations
            /// </summary>
            private static class Violations {

               
    private static readonly ViolationInfo ImageTooLarge =
                   
    new ViolationInfo("ImageTooLarge",
                                     
    ViolationLevel.Warning,
                                      "Image is too large."
    ,
                                      "The Image is too large: {details}."
    ,
                                      "Make sure that the image content is required."
    ,
                                      "Images"
    );

               
    private static readonly ViolationInfo ImagePixelFormatSmall =
                   
    new ViolationInfo("ImagePixelFormatSmall",
                                     
    ViolationLevel.Warning,
                                      "Image pixel format is too small."
    ,
                                      "The Image pixel format is too small"
    ,
                                      "Make sure that the quality of the image is good."
    ,
                                      "Images"
    );

               
    private static readonly ViolationInfo ImageResolutionSmall =
                   
    new ViolationInfo("ImageResolutionSmall",
                                     
    ViolationLevel.Warning,
                                      "Image resolution is small."
    ,
                                      "The Image resolution is too small: ({res})"
    ,
                                      "Make sure that the image quality is good."
    ,
                                      "Images"
    );

               
    internal static void AddImageTooLargeViolation(CrawlerProcessContext context, string details) {
                   
    context.Violations.Add(new Violation(ImageTooLarge,
                           
    0, "details", details));
               
    }

               
    internal static void AddImagePixelFormatSmall(CrawlerProcessContext context) {
                   
    context.Violations.Add(new Violation(ImagePixelFormatSmall, 0));
               
    }

               
    internal static void AddImageResolutionSmall(CrawlerProcessContext context, string resolution) {
                   
    context.Violations.Add(new Violation(ImageResolutionSmall,
                           
    0, "res", resolution));
               
    }
           
    }
       
    }
    }

    As you can see in the BeginAnalysis the module registers three new properties with the Report using the Crawler property. This is only required if you want to provide either a custom text or use it for different type other than a string. Note that current version only allows primitive types like Integer, Float, DateTime, etc.

    During the Process method it first makes sure that it only runs for known content types, then it performs any validations raising a set of custom violations that are defined in the Violations static helper class. Note that we load the content from the Response Stream, which is the property that contains the received from the server. Note that if you were analyzing text the property Response would contain the content (this is based on Content Type, so HTML, XML, CSS, etc, will be kept in this String property).

    Registering it

    When running inside IIS Manager, crawler modules need to be registered as a standard UI module first and then inside their initialization they need to be registered using the IExtensibilityManager interface. In this case to keep the code as simple as possible everything is added in a single file. So add a new file called "RegistrationCode.cs" and include the contents below:

    using System;
    using Microsoft.Web.Management.Client;
    using Microsoft.Web.Management.SEO.Crawler;
    using Microsoft.Web.Management.Server;

    namespace SampleCrawlerModule {
       
    internal class SampleCrawlerModuleProvider : ModuleProvider {
           
    public override ModuleDefinition GetModuleDefinition(IManagementContext context) {
               
    return new ModuleDefinition(Name, typeof(SampleCrawlerModule).AssemblyQualifiedName);
           
    }

           
    public override Type ServiceType {
               
    get { return null; }
           
    }

           
    public override bool SupportsScope(ManagementScope scope) {
               
    return true;
           
    }
       
    }

       
    internal class SampleCrawlerModule : Module {
           
    protected override void Initialize(IServiceProvider serviceProvider, ModuleInfo moduleInfo) {
               
    base.Initialize(serviceProvider, moduleInfo);

               
    IExtensibilityManager em = (IExtensibilityManager)GetService(typeof(IExtensibilityManager));
               
    em.RegisterExtension(typeof(CrawlerModule), new ImageExtension());
           
    }
       
    }
    }

    This code defines a standard UI IIS Manager module and in its client-side initialize method it uses the IExtensibilityManager interface to register the new instance of the Image extension. This will make it visible to the Site Analysis feature.

    Testing it

    To test it we need to add the UI module to Administration.config, that also means that the assembly needs to be registered in the GAC.

    To Strongly name the assembly

    In Visual Studio, you can do this easily by using the menu "Project->Properties", and select the "Signing" tab, check the "Sign the assembly", and choose a file, if you don't have one you can easily just choose New and specify a name.

    After this you can compile and now should be able to add it to the GAC.

    To GAC it

    If you have the SDK's you should be able to call it like in my case:

    "\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /if SampleCrawlerModule.dll

     

    (Note, you could also just open Windows Explorer, navigate to c:\Windows\assembly and drag & drop your file in there, that will GAC it automatically).

    Finally to see the right name that should be use in Administration.config run the following command:

    "\Program Files\Microsoft SDKs\Windows\v6.0A\bin\gacutil.exe" /l SampleCrawlerModule

    In my case it displays:

    SampleCrawlerModule, Version=1.0.0.0, Culture=neutral, PublicKeyToken=6f4d9863e5b22f10, …

    Finally register it in Administration.config

    Open Administration.config in Notepad using an elevated instance, find the </moduleProviders> and add a string like the one below but replacing the right values for Version and PublicKeyToken:

          <add name="SEOSample" type="SampleCrawlerModule.SampleCrawlerModuleProvider, SampleCrawlerModule, Version=1.0.0.0, Culture=neutral, PublicKeyToken=6f4d9863e5b22f10" />

    Use it

    After registration you now should be able to launch IIS Manager and navigate to Search Engine Optimization. Start a new Analysis to your Web site. Once completed if there are any violations you will see them correctly in the Violations Summary or any other report. For example see below all the violations in the "Images" category.

    image

    Since we also extended the metadata by including the new fields (Image Width, Image Height, and Image Pixel Format) now you can use them with the Query infrastructure to easily create a report of all the images:

    image

    And since they are standard fields, they can be used in Filters, Groups, and any other functionality, including exporting data. So for example the following query can be opened in the Site Analysis feature and will display an average of the width and height of images summarized by type of image:

    <?xml version="1.0" encoding="utf-8"?>
    <query dataSource="urls">
     
    <filter>
       
    <expression field="ContentTypeNormalized" operator="Begins" value="image/" />
      </
    filter>
     
    <group>
       
    <field name="ContentTypeNormalized" />
      </
    group>
     
    <displayFields>
       
    <field name="ContentTypeNormalized" />
        <
    field name="(Count)" />
        <
    field name="Average(iWidth)" />
        <
    field name="Average(iHeight)" />
      </
    displayFields>
    </query>

    image

    And of course violation details are shown as specified, including Recommendation, Description, etc:

    image

    Summary

    As you can see extending the SEO Toolkit using a Crawler Module allows you to provide additional information, whether Metadata, Violations or Links to any document being processed. This can be used to add support for content types not supported out-of-the box such as PDF, Office Documents or anything else that you need. It also can be used to extend the metadata by writing custom code to wire data from other system into the report giving you the ability to exploit this data using the Query capabilities of Site Analysis.

  • CarlosAg Blog

    IIS SEO Toolkit - Start new analysis automatically through code

    • 8 Comments

    One question that I've been asked several times is: "Is it possible to schedule the IIS SEO Toolkit to run automatically every night?". Other related questions are: "Can I automate the SEO Toolkit so that as part of my build process I'm able to catch regressions on my application?", or "Can I run it automatically after every check-in to my source control system to ensure no links are broken?", etc.

    The good news is that the answer is YES!. The bad news is that you have to write a bit of code to be able to make it work. Basically the SEO Toolkit includes a Managed code API to be able to start the analysis just like the User Interface does, and you can call it from any application you want using Managed Code.

    In this blog I will show you how to write a simple command application that will start a new analysis against the site provided in the command line argument and process a few queries after finishing.

    IIS SEO Crawling APIs

    The most important type included is a class called WebCrawler. This class takes care of all the process of driving the analysis. The following image shows this class and some of the related classes that you will need to use for this.

    image

    The WebCrawler class is initialized through the configuration specified in the CrawlerSettings. The WebCrawler class also contains two methods Start() and Stop() which starts the crawling process in a set of background threads. With the WebCrawler class you can also gain access to the CrawlerReport through the Report property. The CrawlerReport class represents the results (whether completed or in progress) of the crawling process. It has a method called GetUrls() that returns an instance to all the UrlInfo items. A UrlInfo is the most important class that represents a URL that has been downloaded and processed, it has all the metadata such as Title, Description, ContentLength, ContentType, and the set of Violations and Links that it includes.

    Developing the Sample

    1. Start Visual Studio.
    2. Select the option "File->New Project"
    3. In the "New Project" dialog select the template "Console Application", enter the name "SEORunner" and press OK.
    4. Using the menu "Project->Add Reference" add a reference to the IIS SEO Toolkit Client assembly "c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll".
    5. Replace the code in the file Program.cs with the code shown below.
    6. Build the Solution
    using System;
    using System.IO;
    using System.Linq;
    using System.Net;
    using System.Threading;
    using Microsoft.Web.Management.SEO.Crawler;

    namespace SEORunner {
       
    class Program {

           
    static void Main(string[] args) {

               
    if (args.Length != 1) {
                   
    Console.WriteLine("Please specify the URL.");
                   
    return;
               
    }

               
    // Create a URI class
                Uri startUrl = new Uri(args[0]);

               
    // Run the analysis
                CrawlerReport report = RunAnalysis(startUrl);

               
    // Run a few queries...
                LogSummary(report);

               
    LogStatusCodeSummary(report);

               
    LogBrokenLinks(report);
           
    }

           
    private static CrawlerReport RunAnalysis(Uri startUrl) {
               
    CrawlerSettings settings = new CrawlerSettings(startUrl);
               
    settings.ExternalLinkCriteria = ExternalLinkCriteria.SameFolderAndDeeper;
               
    // Generate a unique name
                settings.Name = startUrl.Host + " " + DateTime.Now.ToString("yy-MM-dd hh-mm-ss");

               
    // Use the same directory as the default used by the UI
                string path = Path.Combine(
                   
    Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
                    "IIS SEO Reports"
    );

               
    settings.DirectoryCache = Path.Combine(path, settings.Name);

               
    // Create a new crawler and start running
                WebCrawler crawler = new WebCrawler(settings);
               
    crawler.Start();

               
    Console.WriteLine("Processed - Remaining - Download Size");
               
    while (crawler.IsRunning) {
                   
    Thread.Sleep(1000);
                   
    Console.WriteLine("{0,9:N0} - {1,9:N0} - {2,9:N2} MB",
                       
    crawler.Report.GetUrlCount(),
                       
    crawler.RemainingUrls,
                       
    crawler.BytesDownloaded / 1048576.0f);
               
    }

               
    // Save the report
                crawler.Report.Save(path);

               
    Console.WriteLine("Crawling complete!!!");

               
    return crawler.Report;
           
    }

           
    private static void LogSummary(CrawlerReport report) {
               
    Console.WriteLine();
               
    Console.WriteLine("----------------------------");
               
    Console.WriteLine(" Overview");
               
    Console.WriteLine("----------------------------");
               
    Console.WriteLine("Start URL:  {0}", report.Settings.StartUrl);
               
    Console.WriteLine("Start Time: {0}", report.Settings.StartTime);
               
    Console.WriteLine("End Time:   {0}", report.Settings.EndTime);
               
    Console.WriteLine("URLs:       {0}", report.GetUrlCount());
               
    Console.WriteLine("Links:      {0}", report.Settings.LinkCount);
               
    Console.WriteLine("Violations: {0}", report.Settings.ViolationCount);
           
    }

           
    private static void LogBrokenLinks(CrawlerReport report) {
               
    Console.WriteLine();
               
    Console.WriteLine("----------------------------");
               
    Console.WriteLine(" Broken links");
               
    Console.WriteLine("----------------------------");
               
    foreach (var item in from url in report.GetUrls()
                                    
    where url.StatusCode == HttpStatusCode.NotFound &&
                                          
    !url.IsExternal
                                    
    orderby url.Url.AbsoluteUri ascending
                                    
    select url) {
                   
    Console.WriteLine(item.Url.AbsoluteUri);
               
    }
           
    }

           
    private static void LogStatusCodeSummary(CrawlerReport report) {
               
    Console.WriteLine();
               
    Console.WriteLine("----------------------------");
               
    Console.WriteLine(" Status Code summary");
               
    Console.WriteLine("----------------------------");
               
    foreach (var item in from url in report.GetUrls()
                                    
    group url by url.StatusCode into g
                                    
    orderby g.Key
                                    
    select g) {
                   
    Console.WriteLine("{0,20} - {1,5:N0}", item.Key, item.Count());
               
    }
           
    }
       
    }
    }

     

    If you are not using Visual Studio, you can just save the contents above in a file, call it SEORunner.cs and compile it using the command line:

    C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe /r:"c:\Program Files\Reference Assemblies\Microsoft\IIS\Microsoft.Web.Management.SEO.Client.dll" /optimize+ SEORunner.cs

     

    After that you should be able to run SEORunner.exe and pass the URL of your site as a argument, you will see an output like:

    Processed - Remaining - Download Size
           56 -       149 -      0.93 MB
          127 -       160 -      2.26 MB
          185 -       108 -      3.24 MB
          228 -        72 -      4.16 MB
          254 -        48 -      4.98 MB
          277 -        36 -      5.36 MB
          295 -        52 -      6.57 MB
          323 -        25 -      7.53 MB
          340 -         9 -      8.05 MB
          358 -         1 -      8.62 MB
          362 -         0 -      8.81 MB
    Crawling complete!!!
    
    ----------------------------
     Overview
    ----------------------------
    Start URL:  http://www.carlosag.net/
    Start Time: 11/16/2009 12:16:04 AM
    End Time:   11/16/2009 12:16:15 AM
    URLs:       362
    Links:      3463
    Violations: 838
    
    ----------------------------
     Status Code summary
    ----------------------------
                      OK -   319
        MovedPermanently -    17
                   Found -    23
                NotFound -     2
     InternalServerError -     1
    
    ----------------------------
     Broken links
    ----------------------------
    http://www.carlosag.net/downloads/ExcelSamples.zip

     

    The most interesting method above is RunAnalysis, it creates a new instance of the CrawlerSettings and specifies the start URL. Note that it also specifies that we should consider internal all the pages that are hosted in the same directory or subdirectories. We also set the a unique name for the report and use the same directory as the IIS SEO UI uses so that opening IIS Manager will show the reports just as if they were generated by it. Then we finally call Start() which will start the number of worker threads specified in the WebCrawler::WorkerCount property. We finally just wait for the WebCrawler to be done by querying the IsRunning property.

    The remaining methods just leverage LINQ to perform a few queries to output things like a report aggregating all the URLs processed by Status code and more.

    Summary

    As you can see the IIS SEO Toolkit crawling APIs allow you to easily write your own application to start the analysis against your Web site which can be easily integrated with the Windows Task Scheduler or your own scripts or build system to easily allow for continuous integration.

    Once the report is saved locally it can then be opened using IIS Manager and continue further analysis as with any other report. This sample console application can be scheduled using the Windows Task Scheduler so that it can run every night or at any time. Note that you could also write a few lines of PowerShell to automate it without the need of writing C# code and do that by only command line, but that is left for another post.

  • CarlosAg Blog

    IIS SEO Toolkit - New Reports (Redirects and Link Depth)

    • 3 Comments

    In the new version of the IIS SEO Toolkit we added two new reports that are very interesting, both from an SEO perspective as well as from user experience and site organization. These reports are located in the Links category of the reports

    Redirects

    This report shows a summary of all the redirects that were found while crawling the Web site. The first column (Linking-URL) is the URL that was visited that resulted in redirection to the Linked-URL (second column). The third column (Linking-Status code) specifies what type of redirection happened based on the HTTP status code enumeration. The most common values will be MovedPermanently/Moved which is a 301, or Found/Redirect which is a 302. The last column shows the status code for the final URL so you can easily identify redirects that failed or that redirected to another redirect.

    image

    Why should you care

    This report is interesting because Redirects might affect your Search Engine rankings and make your users have the perception that your site is slower. For more information on Redirects see: Redirects, 301, 302 and IIS SEO Toolkit

     

    Link Depth

    This is probably one of my favorite reports since it is almost impossible to find this type of information in any other 'easy' way.

    The report basically tells you how hard it is for users that land in your home page to get to any of the pages in your site. For example in the image below it shows that it takes 5 clicks for a user to get from the home page of my site to the XGrid.htc component.

    image

    This is very valuable information because you will be able to understand how deep your Web site is, in my case if you were to walk the entire site and layout its structure in a hierarchical diagram it would basically be 5 levels deep. Remember, you want your site to be shallow so that its easily discoverable and crawled by Search Engines.

    Even more interesting you can double click any of the results and see the list of clicks that the user has to make it to get to the page.

    image

    Note that it shows the URL, the Title of the page as well as the Text of the Link you need to click to get to the Next URL (the one with a smaller index). So as you can see in my case the user needs to go to the home page, click the link with text "XGrid", which takes it to the /XGrid/ url (index 3) which then needs to click the link with text "This is a new...", etc.

    Note that as you select the URLs in the list it will highlight in the markup the link that takes you to the next URL.

    The data of this report is powered by a new type of query we called Route Query. The reason this is interesting is because you can customize the report to add different filters, or change the start URL, or more.

    For example, lets say I want to figure out all the pages that the user can get to when they land in my site in a specific page, say http://www.carlosag.net/Tools/XGrid/editsample.htm:

    In the Dashboard view of a Report, select the option 'Query->New Routes Query'. This will open a new Query tab where you can specify the Start URL that you are interested.

    image

    As you can see this report clearly shows that if a user visits my site and lands on this page they will basically be blocked and only be able to see 8 pages of the entire site. This is a clear example on where a link to the Home page would be beneficial.

     

    Other common scenarios that this query infrastructure could be used for is to find ways to direct traffic from your most common pages to your conversion pages, this report will let you figure out how difficult or easy it is to get from any page to your conversion pages

  • CarlosAg Blog

    Announcing: IIS SEO Toolkit v1.0 release

    • 3 Comments

    Today we are announcing the final release of the IIS Search Engine Optimization (SEO) Toolkit v1.0. This version builds upon the Beta 1 and Beta 2 versions and is 100% compatible with those versions so any report you currently have continues to work in the new version. The new version includes a set of bug fixes and new features such as:

    1. Extensibility. In this version we are opening a new set of API's to allow you to develop extensions for the crawling process, including the ability to augment the metadata in the report with your own, extend the set of tasks provided in the Site Analysis and Sitemaps User Interface and more. More on this on a upcoming post.
    2. New Reports. Based on feedback we added a Redirects summary report in the Links section as well as a new Link Depth report that allows you to easily know which pages are the "most hidden pages" in your site, or in other words if a user landed at your sites home page, "how many clicks does he need to do to reach a particular page".
    3. New Routes Query. We added a new type of Query called Routes. This is the underlying data that powers the "Link Depth" report mentioned above, however it is also exposed as a new query type so that you can create your own queries to customize the Start page and any other kind of things, like filtering, grouping, etc.
    4. New option to opt-out from keeping a local cache of files. We added a new switch in the "Advanced Settings" of the New Analysis dialog to disable the option of keeping the files stored locally. This allows you to run a report which runs faster and that consumes a lot less disk space than when keeping the files cached. The only side effect is that you will not be able to get the "Content" tab and the contextual position of the links as well as the Word Analysis feature. Everything else continues to work just as any other report.
    5. HTML Metadata is now stored in the Report. By leveraging the Extensibility mentioned in bullet 1, the HTML parser now stores all the HTML META tags content so that you can later use them to write your own queries, whether to filter, group data or just export it, this gives you a very interesting set of options if you have any metadata like Author, or any custom.
    6. Several Bug Fixes:
      1. Internal URLs linked by External URLs now are also included in the crawling process.
      2. Groupings in queries should be case sensitive
      3. Show contextual information (link position) in Routes
      4. The Duplicate detection logic should only include valid responses (do not include 404 NOT Found, 401, etc)
      5. Canonical URLs should support sub-domains.
      6. Several Accessibility fixes. (High DPI, Truncation in small resolutions, Hotkeys, Keyboard navigation, etc).
      7. Several fixes for Right-To-Left languages. (Layout and UI)
      8. Help shortcuts enabled.
      9. New Context Menus for Copying content
      10. Add link position information for Canonical URLs
      11. Remove x-javascript validation for this release
      12. Robots algorithm should be case sensitive
      13. many more

    This version can upgrade both Beta 1 and Beta 2 version so go ahead and try it and PLEASE provide us with feedback and any additional things you would like to see for the next version at the SEO Forum in the IIS Web site.

    Click here to install the IIS SEO Toolkit.

  • CarlosAg Blog

    Presenting at ASP.NET Connections in Las Vegas

    • 0 Comments

    Next week I will be presenting at the ASP.NET Connections event in Las Vegas the following topics:

    1. AMS04: Boost Your Site’s Search Ranking with the IIS Search Engine Optimization Toolkit: Search engines are just robots, and you have to play by their rules if you want to see your site in the top search results. In this session, you will learn how to leverage the IIS Search Engine Optimizer and other tools to improve your Web site for search engine and user traffic. You will leave this session with a set of tips and tricks that will boost the search rank, performance and consistency of your Web site. Tuesday 10:00 am.
    2. AMS10: Developing and Deploying for the Windows Web App Gallery: Come hear how the Microsoft Web Platform fosters a powerful development ecosystem for Web applications, and how the latest wave of IIS extensions enable Web applications to move seamlessly from a development environment to a production datacenter. You will also learn how to package a Web application for the Windows Web App Gallery to make it available to millions of users. Thursday 8:15 am.

    I will also be participating in a session called: "Q&A session with Scott Guthrie and the ASP.NET and VWD teams at DevConnections" on Wednesday.

    It should be fun. If you are around stop by the Microsoft Web Platform booth where I will be hanging around the rest of the time trying to answer any questions and getting a chance to learn more about how you use IIS or any problems you might be facing.

  • CarlosAg Blog

    IIS SEO Toolkit – Report Comparison

    • 0 Comments

    One of my favorites features in the IIS Search Engine Optimization (SEO) Toolkit is what we called Report Comparison. Report Comparison basically allows you to compare two different versions of the results of crawling the same site to see what changed in between. This is a really convenient way to track not only changes in terms of SEO violations but also to be able to compare any attributes on the pages such as Title, Heading, Description, Links, Violations, etc.

    How to access the feature

    There are a couple of ways to get to this feature.

    1) Use the Compare Reports task. While in the Site Analysis Reports listing you can select two reports by using Ctrl+Click, and if both reports are compatible (e.g. they use the same Start URL) the task "Compare Reports" will be shown. Just clicking on that will get you the comparison.

    CompareReportsTask

    2) Use the Compare to another report menu item. While in the Dashboard view of a Report you can use the "Report->Compare To Another Report" menu item which will show a dialog where you can either select an existing report or even start a new analysis to compare with.

    CompareReportsMenu

    Report Comparison Page

    In both cases you will get the Report Comparison Page displaying the results as shown in the next image.

    CompareResults

    The Report Comparison page includes a couple of "sections" with data. At the very top it includes links showing the Name and the Date when the reports were ran. If you click on them it will open the report directly just as if you had used the Site Analysis report listing view.

    The next sections shows a lot of interesting built-in data such as:

    Total # of URLs This basically shows the total # of URLs found in both versions. When clicking the link you will get the listing of URLs based on the version of the report you choose.
    New and Removed These are the number of new URLs that were either added in the new version or removed from the old version.
    When clicking the added link you will get the listing of URLs based on the new version of the report and if you click the removed link you will get the listing based on the old URLs.
    Changed and Unchanged These are the number of URLs that were modified or not modified. These are calculated by comparing the hashes of the files in both versions.
    When clicking the links you will get a query that displays a comparison of both versions of URLs showing their content length. (See below)
    Total # of Violations This shows the total # of violations found in both versions.
    New in existing pages and Fixed in existing pages These are the number of violations introduced or removed on URLs that exist in both reports.
    When clicking the added link you will get the listing of violations based on the new version of the report and if you click the removed link you will get the listing based on the old violations.
    Introduced in new pages These are the number of violations introduced on URLs that are found only in the new report.
    When clicking the added link you will get the listing of violations based on the new version of the report.
    Fixed by page removal These are the number of violations that were removed due to the fact that their URLs were no longer found in the new report.
    When clicking the added link you will get the listing of violations based on the old version of the report.
    Others There are a number of additional reports which basically compare different attributes in URLs that are found in both reports. They compare things like Time Taken, Content Length, Status Code and # of Links.
    When clicking the links you will get the query that displays a comparison of both versions of the reports showing the relevant fields. (See below)

    Whenever you click the links you get a query dialog that you can customize just as any Query in the Query builder, where you can Add/Remove columns, add filters, etc.

    My favorite one is the "Modified URLs" source when you actually can add filters that compare URLs coming from the two different reports.

    QueryDialog

    Note that when you double click or "right-click –> Compare Details" any of the rows you get a side-by-side comparison of everything in the URL:

    SideBySideDialog

    Again, you can use any of the tabs to see side-by-side things like the Content of the pages or the Links both versions have or the violations, or pretty much everything that you can see for just one.

    SideBySideDialog2

    Finally, you can also right click on the Query dialog and choose "Compare Contents". This will launch whatever File Comparison tool you have configured using the "Edit Feature Settings". In this case I have configured WinDiff.exe which shows something like:

    SideBySideContents

    Summary

    As you can see Report Comparison offers is a powerful feature that allows you to keep track of changes between two different reports. This easily allows you to understand over time how your site has been affected by changes. For Site managers it will allow them to query and maintain a history with all the changes. You can imagine that using an automated build process that runs IIS SEO Toolkit crawling whenever a build is made that keeps the report stored somewhere and potentially annotate it with the build number you could even keep a correlation of changes in code with Web site crawling.

  • CarlosAg Blog

    IIS SEO Toolkit Presentation at DevConnections

    • 0 Comments

    Yesterday I presented the session "AMS04: Boost Your Site’s Search Ranking with the IIS Search Engine Optimization Toolkit" at the ASP.NET Connections, it was fun to talk to a few attendees that had several questions around the tool and SEO in general. It is always really interesting learning about all the unique environments and types of applications that are being built and how the SEO Toolkit can help them.

    Here are the IIS SEO Toolkit slides that I used.

    Here you can find the IIS SEO Toolkit download.

    And by far the easiest way to get it installed is using the Microsoft Web Platform Installer.

    Please send any question and feedback at IIS SEO Toolkit Forums.

    And by the way, stay tuned for the RTW version of IIS SEO Toolkit coming SOON.

  • CarlosAg Blog

    Slides for IIS – Web Application Gallery presentation at DevConnections

    • 0 Comments

    Two weeks ago I presented at DevConnections the talk "AMS10: Developing and Deploying for the Windows Web App Gallery", here are the slides.

    Download the Web Application Gallery Talk slides here.

     

    A few final links:

    Microsoft Web Platform: http://www.microsoft.com/web/

    Download Web PI: http://www.microsoft.com/web/downloads/platform.aspx

    Submit your Applications at: http://www.microsoft.com/web/gallery/developer.aspx

  • CarlosAg Blog

    IIS SEO Toolkit Extensibility

    • 0 Comments

    The IIS SEO Toolkit includes a lot of functionality built-in such as built-in violation rules, processing of different content types (like HTML, CSS, RSS, etc) and more, however it might not do all the things that you would need it to do, for example, it might not process a set of documents that you use, or it might not gather all the information that you are interested in while processing a document. The good news is that it includes enough extensibility to let you build on top of its rich capabilities and provide additional ones easily using .NET.

    There are three main extensibility points in this first release, including:

    1. Crawler Module. This extensibility point allows you to provide your own code to hook to the process of crawling a Web site in the Site Analysis process. Using this extensibility point you can extend the built in set of violation rules with your own. You can also gather additional information such as links or any metadata of a resource, whether directly extracted from the content or from an external system.
    2. Site Analyzer Extension. This feature allows you to provide your own set of tasks to be exposed in the Site Analysis user interface. These tasks will be displayed in the main menu bar in the report dashboard.
    3. Sitemap Extension. This class allows you to provide your own set of tasks to be exposed in the Sitemaps, Sitemap and Sitemap Index user interface.

    This is the first of a series of extensibility blog entries for the IIS SEO Toolkit where I will cover all of the extensibility points mentioned above.

Page 1 of 1 (9 items)