In the first post of this series I proposed a different timeout handling mechanism to remove the typical tier-dependency performance problems. In order to keep this post under a reasonable size, I recommend you read the first one to fully understand the problem and the proposed solution to it.

To sum it up real quickly, here is what we want to achieve:

·        If calls to a specific service are timing out we want to track that information in our web application

·        After a specific number of consecutive timeouts, we want to mark the service as offline

·        Further requests that require a call to that particular service will not attempt to contact the service

Basically, we want to avoid having all worker threads in ASP.NET blocked while waiting for a service that will probably timeout anyway. This provides a better thread pool management and keeps our application healthy and responding quickly to requests.

First off, let’s see the typical scenario (the one without resilience mechanisms in place):

I creates a new ASP.NET website and changed the default.aspx page to include the following markup:

<asp:Content ID="BodyContent" runat="server" ContentPlaceHolderID="MainContent">

    <h2>

        Resilience in nTier applications

    </h2>

    <p>

    <asp:Button ID="NoResilienceButton" Text="Call Service with no Resilience" width="300px" runat="server" OnClick="CallWithNoResilience"/>

    </p>

    <p>

    <asp:Button ID="ResilienceButton" Text="Call Service with Resilience" width="300px" runat="server" OnClick="CallWithResilience" />

    </p>

    <p>

        <asp:Label ID="Result" runat="server" />

    </p>

</asp:Content>


In my application I included a WCF service with the following code:

[ServiceBehavior(IncludeExceptionDetailInFaults = true)]

public class TheOcasionallySlowService : ITheOcasionallySlowService

{

        public string DoWork()

        {

        //Let's be slow

        throw new TimeoutException();

        }

}


For quick testing purposes, the service is basically returning an exception immediately, but we will later add some code to simulate that it is slow and a little switch to turn that behavior on or off, so that we can further test our solution.

It’s time to write the CallWithNoResilience event handler in the code-behind file of the default.aspx page. After adding the service reference to my WCF service, I wrote the following code:

protected void CallWithNoResilience(object sender, EventArgs e)

{

    string result = "Result: ";

 

    myService.TheOcasionallySlowServiceClient proxy = new myService.TheOcasionallySlowServiceClient();

    try

    {

        result += proxy.DoWork();

    }

    catch (Exception ex)

    {

        result += ex.Message;

    }

    finally

    {

        proxy.Close();

    }

 

    Result.Text = result;

}


Really simple implementation here: we are basically calling our service and displaying the result in the Result label in our page. If an exception is generated we write out the error message. In the catch block we would log the exception somewhere, but that is not relevant for our code.

This is pretty much the traditional way of talking to services and handling errors. If we start our application and click on the “Call Service with no resilience” button, we will see the error message displayed. And if we click it 10 times in a row, we always see the same behavior.

There is no performance problem here because our service just returns an exception immediately. But imagine this was a true timeout after 30 seconds (the default): for 30 seconds our 10 requests would be left hanging and it would be less 10 worker threads available to us.

Let’s imagine we only have 10 threads in the pool, if another request came in asking for the Default.aspx page (a simple request with no service calls involved), it would be placed in the queue until one of the service calls timed out and the worker thread could complete the request and return to the pool.

Our objective is to avoid this behavior by adding some intelligence to the application and make it less stubborn: if the service is failing repeatedly, let’s stop trying to call it.

Let’s see how to implement this:

First, I created a class to represent the information we need to gather:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Web;

 

public enum ServiceStatus { ONLINE, OFFLINE };

 

public class ResilientService

{

    public string ServiceName { get; set; }

    public ServiceStatus Status { get; set; }

    public int RetryCount { get; set; }

    public int RetryLimit { get; set; }

}


This class also uses an enumerable called ServiceStatus to make our coding tasks easier. So, for each WCF service we call in our application, we keep track of the status, the current retry counter and the limit of consecutive calls we make before marking it as offline.

Because we want to keep our code simple, I decided to encapsulate all the iteration with this information in a single class. But because we also want to be able to change the underlying implementation (mainly, where we actually story the information), I decided to define an interface with the following code:

using System;

using System.Collections.Generic;

 

public interface IResilienceHelper

{

    ResilientService GetService(string serviceName);

    ServiceStatus GetStateOfService(string serviceName);

    void IncrementServiceRetryCount(string serviceName);

    void InitializeServices(List<ResilientService> serviceList);

    void ResetServiceRetryCount(string serviceName);

    void UpdateServiceStatus(string serviceName, ServiceStatus status);

}


This interface defines all the methods we want to have in order to keep track and change the information about a service.

Let’s move on to the actual implementation:

public class HttpApplicationResilienceHelper : IResilienceHelper

{

    private HttpApplicationState app = HttpContext.Current.Application;

 

    //Receives a list of ResilientService and places them in the HttpContent.Application object

    public void InitializeServices(List<ResilientService> serviceList)

    {

        foreach (ResilientService service in serviceList)

        {

            app[service.ServiceName] = service;

        }

    }

 

    //Retrieves a service from the Http.Application object

    public ResilientService GetService(string serviceName)

    {

        return app[serviceName] as ResilientService;

    }

 

    //Returns the current status of a service

    public ServiceStatus GetStateOfService(string serviceName)

    {

        return GetService(serviceName).Status;

    }

 

    //Increments the retry count of a service and marks it offline if limit reached

    public void IncrementServiceRetryCount(string serviceName)

    {

        ResilientService service = GetService(serviceName);

        int currentRetryCount = service.RetryCount;

 

        currentRetryCount++;

 

        if (currentRetryCount >= service.RetryLimit)

        {

            UpdateServiceStatus(serviceName, ServiceStatus.OFFLINE);

        }

        else

        {

            service.RetryCount = currentRetryCount;

        }

    }

 

    //Resets the current retryLimit of a service

    public void ResetServiceRetryCount(string serviceName)

    {

        GetService(serviceName).RetryCount = 0;

    }

 

    //Updated the status of a service

    public void UpdateServiceStatus(string serviceName, ServiceStatus status)

    {

        GetService(serviceName).Status = status;

    }

}

 

So, for my application, I decided to store the information in the HttpApplication object. Notice that, because we are implementing an interface, I could choose to create a different implementation and store the information on a local database or the file system. We will discuss this subject further ahead.

Now that we have a system in place, we need to actually use it. I created a new static class called ServicesInitializer:

public static class ServicesInitializer

{

    //Initializes the resilience helper for this website as an HttpApplicationResilienceHelper

    public static IResilienceHelper ResilienceHelper = new HttpApplicationResilienceHelper();

 

    public static void Initialize()

    {

        //Simple initialization...

        //could read the list of services and other configuration from the web.config

        ResilientService service = new ResilientService();

        service.ServiceName = "myService";

        service.RetryCount = 0;

        service.RetryLimit = 3;

        service.Status = ServiceStatus.ONLINE;

 

        List<ResilientService> serviceList = new List<ResilientService>();

        serviceList.Add(service);

 

        ResilienceHelper.InitializeServices(serviceList);

    }

}


For simplicity sake, I’m just hardcoding the information about my service and using the InitializeServices method defined in my interface to store that data. I also expose the ResilienceHelper object to be used in my application code to encapsulate the manipulation of that information.

As you can probably guess, I’m going to call the static Initialize method from the global.asax file:

void Application_Start(object sender, EventArgs e)

    {

        ServicesInitializer.Initialize();

    }


So, when the application starts we will use our HttpApplicationResilienceHelper to store information about our service in the HttpApplication object. All we need to do know is implement the CallWithResilience event handler, using this little framework we created:

protected void CallWithResilience(object sender, EventArgs e)

{

    string result = "Result: ";

 

    if (ServicesInitializer.ResilienceHelper.GetStateOfService("myService") == ServiceStatus.OFFLINE)

        result += "Service unavailable at this time, please try again later";

    else

    {

        myService.TheOcasionallySlowServiceClient proxy = new myService.TheOcasionallySlowServiceClient();

        //Service is online, let's try to contact it

        try

        {

            result += proxy.DoWork();

            //if no exception, we can reset the retrycounter for this service

            ServicesInitializer.ResilienceHelper.ResetServiceRetryCount("myService");

        }

        catch (Exception ex)

        {

            //log the result, return error and increment retrycount

            result += ex.Message;

            ServicesInitializer.ResilienceHelper.IncrementServiceRetryCount("myService");

        }

        finally

        {

            proxy.Close();

        }

    }

    Result.Text = result;

}


So, what changed? Before we try to contact the service we check it its status is offline. In that case, we will not attempt to query it. If, however, the service is marked as being online, we make the call the usual way. If the service returns correctly we will set the retry counter to 0 on that service. If it fails with an exception, we increment the count. The idea is, once we reach a specific retry count the service will be marked as offline and we avoid further futile tries.

If we now test our application, and click the button 3 times we always receive the timeout exception. But if we click it a 4th time, we will be presented with a friendly message “Service is unavailable at this time, please try again later”, which means we did not attempt to contact the service at all.

What is important, is that we are not safe-guarding the performance of the web application by not keeping threads waiting for a service that we already know is having problems. All requests that demand a call to this service will be immediately replied to so the application will not display hang symptoms and we avoid the dreaded “Service too busy” that signifies there are no threads available to serve our request.

But, this is only half of the problem. The other half is making sure our application can recover nicely. We don’t want to prevent calls to the service for all eternity so we need a mechanism to make sure we can detect the service is back online. We will see possible ways of achieving this in the next post in the series.

For now, we can discuss if keeping the information in the HttpApplication object is a good thing. Other alternatives would be storing it out-of-process somewhere like in a local database or the file system.

Using the HttpApplication as we did here has both advantages and disadvantages:

·        It’s fast and easy to get access to it

·        It self-recovers with every application restart (might be good or bad)

·        It’s shared memory, so we might run into concurrency problems (dirty reads)

·        It consumes memory inside our process (albeit not much)

·        It requires in-process mechanism for recovery

I wouldn’t recommend using the file system because it’s slow and you will have more trouble with concurrency (locked files will throw an exception when you try and access them). Using a local SQL Server, here’s what you can expect:

·        A couple of extra database roundtrips for each service call

·        You’re forced to have a SQL Server express running in your server

·        Concurrency is handled nicely by the database engine

·        Keeps state between application restarts (might be good or bad)

·        Can be updated outside of the application (easier status recovery)

You can opt to have a centralized SQL server to store this information. However, if the problems are only occurring between a specific web server in your farm and the backend tier (network related issues), you risk doing more harm than good by preventing all servers from accessing the service, therefore, using a local SQL Server will allow more granular control.

The term recovery is being used to signify changing the status of the service online again once it can be assessed that it is working correctly. That will be the topic of the next post where we discuss different approaches to this.

Until then, have fun!