Welcome to MSDN Blogs Sign in | Join | Help

… it means that .Net Services workflow will not ship as part of Version 1 of the Azure Services Platform.

Workflow iconFor more details, and the reasons why, see:

http://blogs.msdn.com/netservicesannounce/archive/2009/06/12/upcoming-important-changes-to-net-workflow-service.aspx

Note that this change only affects people hosting their workflows in the cloud on the Azure Service Platform; it doesn’t affect you if you are hosting the workflow engine in your own application.

Neil.

Ok, so I didn’t get to go to MIX, but we watched the keynote in the MTC boardroom this afternoon – it was almost like being there – no jet lag, no hangover, nice comfy leather seats, coffee on tap – ok, no point me trying to hide it I'm jealous of those that are in Vegas ;-)

If you haven't seen the keynote, it is well worth a watch – a 2 hour session which is packed full of announcements and new product details from Blend to Windows Azure (sorry, that was a close as A-Z as I could get). I was already aware of most of the announcements, but it is great to see how all the pieces fit together. Sessions like really allow you to take stock of where the technology is today and where it is moving to. In some respects it is kind of scary; I already feel left behind in the design space and we seem to be accelerating our level of innovation there. And we keep making it easier and easier to complex things – hard won skills become commonplace with new releases versions.

I’m sure that most Microsoft blogs will contain summaries of the session, so I’ll keep this brief.

Windows Azure: PHP, FastCGI (ok, I’m unashamedly a .Net guy, but I like the idea that more devs will be able to enjoy Azure), Full Trust for .Net and Geo Location. For more details:

  http://blogs.msdn.com/windowsazure/archive/2009/03/18/windows-azure-delivers-new-ctp-capabilities.aspx

Get details and the download link to the new SDK here:

http://blogs.msdn.com/jnak/archive/2009/03/18/now-available-march-ctp-of-the-windows-azure-tools-and-sdk.aspx

Blend 3: I was beyond Super Excited and almost reached a state of hyper-jazzed by Jon Harris’ ScetchFlow demo. A major part of what we do in the MTC is around user experience and this is really going to change the way we work.

http://channel9.msdn.com/shows/Continuum/First-Look-at-Expression-Blend-3/

Silverlight 3: Silverlight just gets cooler and cooler. The new streaming support and the free availability IIS Media Services (streaming support for IIS) are really going to lower the bar to entry to online video and raise the bar in terms for user experience. Out of browser support for Windows and Mac will add a new dimension to some POCs – guess we are going to need a Mac to demo on.

Expression Web 3: SuperPreview looks to be a very well thought out and needed feature of Expression Web.

Oh and as for DeepZoom and Playboy – well I guess it was only a matter of time.

For more details there is a write up of the keynote at:

http://visitmix.com/Opinions/MIX09-Live-Blog-2-Advancing-User-Experiences-Scott-Guthrie

But the actual video is well worth a watch.

Neil.

I’ve just found out about an attempt to start a UK based Windows Azure community. The initial meeting is on 31st March in London. Details are …

We would like to invite you to a session titled “Windows Azure Update, What it is & What it isn’t” including a Q&A session with 3 x members of the Azure team from Microsoft Corp along with a session that introduces “UK AzureNET” which is the name of what we hope evolves into UK Azure User Group / community!

Please see agenda below along with speakers bio and registration details below:-

Agenda

6.15pm - 6.30pm              Arrive and Registration
6.30pm - 7:30pm              Azure Update “What it is & What it isn’t” + Q&A
7.30pm - 8:00                   UKAzurenet “Introduction”

8:00pm – 8.30                   Food / drinks / Network

Matt Rogers (Senior Field Marketing Manager for Windows Azure)

In this role he is responsible for global go-to-market strategy, understanding customer needs for the new service, and preparing Microsoft's global sales force and partner community for the upcoming launch.

James Conard (Senior Director, Azure Services, .NET, and Visual Studio Evangelism)

James leads a global team of senior Microsoft evangelists who help customers realize the full potential of their IT investments using Microsoft’s Azure Services Platform, .NET Framework and Visual Studio.  James and his team have worked with customers and Microsoft partners around the world in understanding how these Microsoft offerings reduce IT costs and project timelines while delivering innovation solutions to market.

Michael Maggs, Senior Director, Windows Azure Partner Marketing

Michael is responsible for the global partner ecosystem of Windows Azure.  He and his team work with leading system integrators, software development firms and IT consultancies to enable enterprises to adopt Microsoft’s cloud computing platform.

How do I register?

If you would like to attend this event please go to http://azureupdate.eventbrite.com/, sign up and if you have 2mins introduce yourself and say why you would like to attend.

If you have questions about the platform and future plans, who better to ask that the Product Group. Looks like it should be a good session, I’ll will certainly be trying to get there.

Neil.

SQLServices_h_rgb_r The SQL Data Services team are making some big changes – ACE (Authority, Container, Entity) is no longer going to be supported and they are moving to expose TDS – TDS is the protocol that you are using to talk to your current SQL Servers!

I know that The Product Group have had to make some hard decisions around this and if you have already committed a lot of time and effort to the ACE model, I hope that this change doesn’t hurt you too much. I also hope that the “new” functionality will prove this a worthwhile change.

Personally I am very excited about this; as much more of the SQL that we know and love is being exposed – Stored Procs, Views, Indexes, ADO.Net Compatibility … for more information:

  http://blogs.msdn.com/ssds/archive/2009/03/10/9469228.aspx

Neil

Simon has just started a, long overdue, blog. He is an Architect Evangelist – yes I know we’ve heard all the Evangelist jokes and if I could change the job tile I would – in DPE in the UK and specialises in our latest and greatest emerging technologies. As he can walk on water and heal the sick; this will be well worth keeping an eye on.

Neil

For a while now I have been using a class that wraps and adds extra functionality to the queue in the StorageLib sample in the Windows Azure SDK. There are a few benefits that this wrapper provides, so I thought it might be time to share:

  • Strongly typed access to the queue.
  • Warning when you forget to remove a message from the queue.
  • Automatic serialization/deserialisation of the message content.
  • Hooks to provide poison message detection and handling

So lets take a look at each of these; if I’m going too slow and you can’t wait to get to the code then there is a link at the bottom of this post.

Strongly Typed Access

If you are only storing one type of message on your queue, then using Generics we can get rid of all the casting code that you would normally have to write. If we define a .Net type to represent the message that we are going to place on the queue, then we can create an API that becomes cleaner, type safe and more self describing.

Creating our queue:

  TypedQueue<NewUserRequest> newUserQueue = new TypedQueue<NewUserRequest>("new-users");

Writing to the queue

  NewUserRequest nur = new NewUserRequest( ... );
  newUserQueue.PushMessage(nur);

and reading from the queue

  DequeuedMessage<NewUserRequest> deNur = newUserQueue.PopMessage ();
  ProcessNewUserRequest (deNur.MessageContents);
  newUserQueue.RemoveMessage (deNur);

Ok, there is quite a bit of important code missing from these snippets, but hopefully you get the idea. The use of generics when defining the TypedQueue type allows for us to have a very clean way of working with our queues.

Warning when you forget to Remove a Message

It is all to easy to forget to call RemoveMessage when you have finished processing the message you just read from the queue. This is where the DequeuedMessage<> type comes in use. It is used to wrap the Message that comes back from the queue, but in debug builds it has a Finalizer method that will generate an Assert if the message was never passed to RemoveMessage. So any missing class to RemoveMessage should show up during your release cycle. This code won’t make it into a release build so there is no perf hit when you get to production.

Automatic Serialisation/Deserialisation

The TypedQueue class is using the XmlSerialzer to turn the .Net object into a form that can be written to the queue. This does mean that all the properties of your type need to be public and that you have to have a default constructor. This isn’t too much of an impact, but because the current CTP runs code in partial trust, we can’t use the binary formatter. One thing to be aware of is that we only have 8kb to serialise our message to, so if you are writing largish objects to the queue, you might want to use the XmlAttributeAttribute to help control the serialised form of your type.

I know people that are using WCF’s DataContractSerializer but I have to confess that I have a much better understanding of how the XmlSerialiser works, hence I used the thing I know – if the only tool you have is a hammer; everything starts to look like a nail ;-) If you don’t like my old school serialisation it should be an easy change.

Hooks for poison message detection

The TypedQueue class allows you to specify a couple of delegates that will be used to detect and process poison messages. I have provided a coupe of implementations of these, but nothing more sophisticated than I blogged about in a previous post.

If you define a PoisonMessgeCheck function, the TypedQueue class will use this to determine if a message is potentially poisonous before it tries to deserialise it. If this check says that the message is bad the PoisonMessageHandler delegate is called. I’ve only written very trivial implementations (look in PoisonMessageHelpers.cs) but it wouldn’t be too hard to write on that wrote the message contents to a poison message table. After this call, the TypedQueue will remove the message from the queue effectively deleting it.

To set up the TypedQueue to look for message that are regarded as poisonous if they have been on the queue for 10 times the visibility timeout, your code would look like:

TimeSpan poisonTime = TimeSpan.FromSeconds (10 * newUserQueue.MessageVisabilityTimeOut);
newUserQueue.PoisonMessgeCheck = 
      PoisonMessageHelpers.PoisonIfOnQueueForLongerThan (poisonTime); 

This check is performed from the PopMessage function. If a message is poisonous, the API will behave exactly the same as if there had been no message on the queue – i.e. return a null. This means we don’t need to include or poison message code in our main queue processing loop allowing it to look like:

  while (true) {
      DequeuedMessage<NewUserRequest> deNur = newUserQueue.PopMessage ();
      if (deNur != null) {
          ProcessNewUserRequest (deNur.MessageContents);
          newUserQueue.RemoveMessage (deNur);
      } else {
          Thread.Sleep (NO_MESSAGE_SLEEP_TIME);
      }
  }

If you haven’t seem my earlier posts and the Thread.Sleep looks a little strange, best have a quick read of this.

The same pattern can be applied if you want to pull messages of the queue in batches, just pass the max number of messages you want to PopMessages. Any messages that are poisonous will be removed from the queue before any valid messages are returned in a List.

Disclaimer

The code I’ve attached has been ripped out of the code from a couple of Proof of Concept projects, as a result, you shouldn’t regard this as well tested. The concepts have been proved to work, but this implementation hasn’t.

Oh, and the TypedQueue should be as thread safe as the StorageLib.Queue it is wrapping – no that wasn’t a statement that it is thread safe do probably best to create a separate instance per thread ;-)

The code is here and bonus points if you notice where the file is stored ;-)

Hope you find this as useful as I have.

Neil.

Ok, last update on this subject, but we can now confirm that we will be in Cambridge on 10th Feb delivering the Azure Briefing. For more details, see:

http://blogs.msdn.com/ukisvdev/archive/2009/01/14/new-dates-for-azure-technical-briefing-announced.aspx

Neil.

I am pleased to say that we have managed to arrange a couple more dates/locations for the Azure Technical Briefing. They are 11th Feb in Edinburgh and 13th Feb in Bradford.

For more details, please see:

http://blogs.msdn.com/ukisvdev/archive/2009/01/14/new-dates-for-azure-technical-briefing-announced.aspx

We are hoping that we can also get to Cambridge that week, but haven’t yet got confirmation on the availability of the facilities, so watch this space. Right I’m off to try and persuade the powers-that-be that I need one of these to help me travel round the UK:

http://news.bbc.co.uk/2/hi/africa/7821979.stm

Hope to see you there,

Neil.

We are running a couple of briefings on the Microsoft cloud technology stack for ISVs in the next few weeks. I’ll be doing a bit on Windows Azure but the Live Platform and SQL Services will also be on the agenda. If you are interested in finding out about all this new cloud magic shenanigans technology then come along; listen to the presentations and join in the discussion afterwards.

The first one will be in Reading on 20th Jan and the second on in London 22nd Jan in our offices in Victoria.

More information and details of how to register at at:

  http://blogs.msdn.com/ukisvdev/archive/2008/12/18/azure-technical-briefings-for-jan-2009-just-announced.aspx

Now we are very aware that these sessions are very much “in the south” and as none of the presenters are from “the south”, we are considering redelivering this at other locations in the UK, so if you aren’t able to make it to Reading or London then please do let me know..

Hope to see you there.

Neil

On a recent POC we found a bit of a problem with the time it was talking to deploy our application to Azure’s staging environment then deploy it into production. Now I must point out that we had a token that allowed us to have quite a lot of machine instances and those machines were being built and deployed far faster than I could build them – well actually faster than I could find my Windows 2008 DVD! However this time lag didn’t fit very well with the development cycle of our Silverlight UI. The application was basically a big XAP that talked to WCF services hosted in an Azure web role. As we approached the end of the POC, we were making infrequent changes to our service layer, but the UI code was being update very frequently – mostly small cosmetic fixes.

When deploying an Azure web role, you need to package up the contents of your site - including all the static content, i.e. any Silverlight XAPs. This means that for every UI change that we wanted to preview/test against our Azure services we needed to deploy the whole application. To solve this problem we changed our app so that our XAPs were stored in the Azure blob store; allowing us to upload them whenever we wanted with almost zero turn round time – in fact our last build of the UI was less than 60 seconds before the demo!

To do this required the following changes:

  1. An ASP.Net page to upload an XAP into the blob store – not strictly necessary as you could use a tool like SpaceBlock to do this for you.
  2. An ASP.Net handler to serve the requests for XAPs
  3. Changing the embedded Silverlight tags to point at the handler rather than the static file (normally in ClientBin)

The upload page

Ok this won’t win any coding awards, nor should you put this into production, but it worked for our POC and is enough to get you started:

    protected void Button1_Click(object sender, EventArgs e)
    {
        StoreXap(FileUpload1.FileName, this.FileUpload1.FileBytes);
    }

    protected void StoreXap(String name, byte[] data)
    {
        BlobContainer container = SLDownloader.GetBlobContainerForSLXaps();

        BlobProperties blobProps = new BlobProperties(name);
        blobProps.ContentType = "application/x-silverlight-2";

        BlobContents blobConts = new BlobContents(data);
        container.CreateBlob(blobProps, blobConts, true);
    }

    private static BlobContainer GetBlobContainerForSLXaps()
    {
        StorageAccountInfo sai = StorageAccountInfo.GetDefaultBlobStorageAccountFromConfiguration();
        BlobStorage blobStorage = BlobStorage.Create(sai);

        BlobContainer container = blobStorage.GetBlobContainer("XAP");
        container.CreateContainer();
 
        return container;
    }

Where FileUpload1 is just an ASP.Net upload control:

    <form id="form1" runat="server">
    <div>
        <asp:FileUpload ID="FileUpload1" runat="server" />
    </div>
    <asp:Button ID="Button1" runat="server" onclick="Button1_Click" Text="Button" />
    </form>

The ASP.Net Handler

The ASP.Net handler looks like:

[WebService(Namespace = "http://tempuri.org/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
public class SLDownloader : IHttpHandler
{
    public void ProcessRequest(HttpContext context)
    {
        WriteImageToHttpResponse(context.Request.QueryString["xap"], context.Response);
    }

    public static void WriteImageToHttpResponse(String xapName, HttpResponse response)
    {
        byte[] xapData = null;
        String contentType;
        BlobContainer container = GetBlobContainerForSLXaps();

        MemoryStream ms = new MemoryStream();
        BlobContents blobConts = new BlobContents(ms);
        BlobProperties blobProps = null;
        try
        {
            blobProps = container.GetBlob(xapName, blobConts, false);
            xapData = ms.GetBuffer();
            response.ContentType = blobProps.ContentType;
            response.OutputStream.Write(xapData, 0, xapData.Length);
        }
        catch (StorageClientException sce) 
        {
            // Ok, the XAP we wasn't isn't there, return a 404?
            // ...
        }
    }

    public bool IsReusable
    {
        get { return false; }
    }
}

Changing the tag

And finally fix up the Silverlight control so that it gets the XAP from the handler rather than from the more conventional static file in ClientBin:

    protected void Page_Load(object sender, EventArgs e)
    {
        Xaml1.Source = "~/SLDownloader.ashx?xap=MySilverlightApp.xap";
    }

We could have just made the blob container public pointed the Silverlight straight at the XAP blob, but in this case there were a few reasons for using the handler:

  1. I knew that using the handler would work – I needed to get it working quickly
  2. Wouldn’t cause any issues with Silverlight’s web service policy 
  3. We were using LiveId for authentication and I didn’t want to bypass any checks we were doing and download the XAP to just anyone and
  4. I modified the handler to return the file from the ClientBin directory if the blob didn’t exist – i.e. the blob version overrode the one that was shipped – it was a demo app and I was worried that we might forget to upload the XAP to storage ;-)

Easy and quite a powerful technique to decouple your XAPs – well any static content - from your Azure deployment package.

Neil.

One thing that you need to consider when designing your Azure application is "what is the minimum deployment"? If your application has 5 different queues with a worker role to read each queue, we are going to want at least 2 instances per role (for some redundancy) giving you a minimum deployment of 5 packages and 10 instances. Hmmm, if you have very little traffic that seems a little extravagant; so let take a look at how we can scale down our requirements and potentially make deployment easier. If you aren’t used to multi-threaded programming; the first thing that probably springs to mind is to round robin all of our queues in one big loop:

    public override void Start()
    {
        while (true)
        {
            ProcessQueue1();
            ProcessQueue2();
            // ....
            ProcessQueueN();
        }
    }

Assume that queues 1 to N have different messages and require different processing – otherwise we would put them all in the same queue – right?

While this would work, we can do better. There a couple of things we should look to improve on:

  1. Idle CPU time: A lot of the time we are going to do something along the lines of; read from queue, write to table/blob/queue/SDS then delete the message from the queue. Now we might end up doing some processing of our message, but there is significant amount of network traffic going on. Every time we read/write to the storage system we are making an off machine call. This means that our application will block until the network call is completed - i.e. our node is doing nothing.
  2. Fault tolerance: If there is an error with one of the processors - say a poison message on Queue 2, then ProcessQueueN is never going to get checked.
  3. Simplicity: We might have a high priority queue that we want to check very frequently and a low priority queue that we don't check that often. Any logic we add to achieve this is going to get complex and won't be very easy to change through configuration.

By multi-threading our worker role, we should be able to solve all these issues. One thing to keep in mind is that there is only 1 CPU in the virtual machine that the CTP of Azure builds for you; so multithreading isn't going to help if we are doing lots of CPU bound processing. Oh, and one proc means that the parallel extensions won't help us here as they will only ever give us one thread - thanks to Daniel for confirming that over a beer at TechEd.

There are 2 types of people who don’t write multi-threaded code; those that don’t understand it and have never tried and those who really understand it and know how difficult it is to get right. Having said that, here are a few steps that will multi-thread our application without us having to step too far into the multi-threading minefield.

If you have followed Part 3, you will know that we currently have a nice class with an instance method that performs all our processing; this positions us very nicely to multi-thread our app. The biggest problem people get into when writing concurrent applications is protecting (or not protecting) access to shared resources (variables, files, queues …). If multiple threads access the same thing at the same time, there could be problems – a bit like two kids grabbing for the ketchup at the same time – there will be tears and ketchup everywhere. By moving our MessageQueue instance into our processor class and making it private should ensure that only that processor instance can access it. If we create one processor per thread and ensure that all the resources that that processor needs are contained in that processor object we no longer have any shared resources – every kid gets their own bottle of ketchup. This neatly lets us step over the first mine.

For every different queue/message type that we are going to be monitoring, we need to create a new IProcessMessages implementation. If we just want several threads monitoring the same queue, then we would just create several instances of our single processor type and execute each one on a separate thread.

Because we are going to have several processors executing, we are going to need to keep track of them, so lets create a type to do that:

class ProcessorRecord {

    /// <summary>
    /// The last time this thread completed some work
    /// </summary>
    public DateTime LastThreadTest { get; set; }

    /// <summary>
    /// The type that represents this processor
    /// </summary>
    private Type processorType;

    /// <summary>
    /// The instance of the processor type that we are wrapping
    /// </summary>
    public IProcessMessages Processor { get; private set; }

    public ProcessorRecord(Type type)
    {
        processorType = type;
        ResetProcessor();
    }

    /// <summary>
    /// Recreate the processor instance - used in the event of an error.
    /// </summary>
    public void ResetProcessor()
    {
        Processor = (IProcessMessages)Activator.CreateInstance(processorType);
        LastThreadTest = DateTime.Now;
    }
}

That is all reasonably straight forward. We use the type class rather than an instance of a processor because we will want to re-create the instance if an error occurs – see part 3 for more details.

Now we come to modifying our main loop. One change that is going to happen here is that we are going to use the main thread to periodically check the status of all our processors. This is better done here rather than in the GetHealthStatus function simply because we have more control over when it gets called. e.g. if GetHealthStatus gets called every 5 seconds and we don’t expect to hear from our threads for 20 seconds then we are executing the health check logic far to often – this wasn't a problem with our single threaded implementation, but now we have several time stamps to check. So the code for GetHealthStatus gets even simpler - so it just reports on the contents of a class variable:

private RoleStatus status;

public override RoleStatus GetHealthStatus()
{
    return status;
}

Our Start function is modified so it starts a thread per processor then to spend the rest of the apps lifetime checking their last reporting times:

public override void Start()
{
    status = RoleStatus.Healthy;

    ProcessorRecord [] processors = new ProcessorRecord[] { 
        new ProcessorRecord (typeof(ProcessQueue1)),
        new ProcessorRecord (typeof(ProcessQueue2)),
    };


    foreach (ProcessorRecord pr in processors) {
        Thread th = new Thread (RunProcessor);
        th.Start (pr);
    }

    // Now just monitor the health of the threads that we have created.
    //
    while (true)
    {
        Thread.Sleep(HEALTH_CHECK_SLEEP);
        RoleStatus localStatus = RoleStatus.Healthy;

        lock (dateTimeLock)
        {
            foreach (ProcessorRecord pr in processors)
            {
                if (lastHealthyReport.AddSeconds(DEFAULT_REPORT_TIMEOUT) < DateTime.Now)
                {
                    localStatus = RoleStatus.Unhealthy;
                    // No need to continue looking.
                    break;
                }
            }
        }
        status = localStatus;
    }
}

And finally the code that was in our Start method at the end of part 3, makes it into the RunProcessor method; this is the start point for our new thread.

public void RunProcessor (Object state) {
    ProcessorRecord pr = (ProcessorRecord)state;
    int sleepTimeIndex = 0;

    while (true)
    {
        try
        {
            if (pr.Processor.Process() == false)
            {
                Thread.Sleep(DEFAULT_SLEEP_TIME);
            }

            // Updating the date time only if an error is not occur
            lock (dateTimeLock)
            {
                pr.LastThreadTest = DateTime.Now;
            }

            // reset the sleep time
            sleepTimeIndex = 0;
        }
        catch (Exception ex)
        {
            RoleManager.WriteToLog("Error",
                "Error occured, trying again"
                + ex.ToString());
            pr.ResetProcessor();

            // Sleep using our back off
            sleepTimeIndex = Math.Min(sleepTimeIndex++, SLEEP_TIMES.Length - 1);
            Thread.Sleep(SLEEP_TIMES[sleepTimeIndex]);
        }
    }
}

That leaves us with the same reliability that we built into part 3, but enables us to run multiple processors on multiple threads. It isn’t too much of a stretch of the imagination to consider storing the information that the ProcessorRecord class need in table storage. This would be the name of the processor and potentially sleep time and how long the thread is allowed between report times before it goes unhealthy. This would allow us to configure at runtime the number and types of processors running on our worker roles – very useful if we start to see much more traffic of a particular message type than another – add some more threads rather than some more machine instances. More interestingly, it also means that we only have to build and deploy one package to Azure. This model might not suit everyone, but it is a great starting point – especially if you have a CTP token that only allows you to have one type of worker role!

Neil.

After the last 2 blog entries, we have our worker process's main loop feed back into Azure's heath checking system. If we go unhealthy, Azure will notice and will eventually restart the worker role. However, this is a bit heavy handed; what if the failure was a temporary network error, a poison message or another bug - do we really want our node to die so easily? By refactoring our code slightly we can make our process far more robust and harder to kill than Chuck Norris!

Step 1, we need an interface - remember: every computer science problem can be solved by adding a new level of abstraction; but beware: a new level of abstraction will introduce a new problem.

Lets call it IProcessMessages - I prefer interface names that are grammatically correct - IDoThings, IHateThings, IDontGetOutMuch:

    public interface IProcessMessages 
    {
        bool Process ();
    }

Were Process will return true if it found a message and false if the queue was empty; we need this to determine if the node should have a quick sleep before checking the queue again.

We can now create some processor types:

    public class QueueProcessor : IProcessMessages {

        MessageQueue queue;

        public QueueProcessor ()
        {
            QueueStorage qs = QueueStorage.Create (
                      StorageAccountInfo.GetDefaultQueueStorageAccountFromConfiguration());
            queue = qs.GetQueue ("myqueue");
        }

        public bool Process()
        {
            Message msg = queue.GetMessage ();
            if (msg != null) {
                ProcessMessage (msg);
            }
            return msg != null;
        }
    }

We've created an instance type and the initialisation of the queue is moved into the constructor. The creation of a queue is an expensive operation, as GetQueue involves a trip to the storage system, so we've moved it out of our processing loop. We can now modify our main loop to look like:

    public override void Start()
    {
        QueueProcessor p = new QueueProcessor ();

        while (true)
        {            
            // Updating the date time in the main loop
            lock (dateTimeLock)
            {
                lastThreadTest = DateTime.Now;
            }            
            
            if (p.Process () == false) {
                Thread.Sleep(DEFAULT_SLEEP_TIME);
            }
        }
    }

Functionally we are now pretty much back to what we had at the end of Part 2. Time for our bulletproof vest ... we can now update our main loop to catch any exceptions thrown from the process method. If we catch an exception, we can destroy the current processor, create a new one and carry on as though nothing happened. Now this isn't going to work for all errors and normally I wouldn't recommend catching exceptions that you can't explicitly deal with; as some of them could be unrecoverable. However, we have Azure's status check watching our back. If we don't update our lastThreadTest counter in the event of an exception, we will eventually go unhealthy and Azure will restart us. So our main loop looks like:

    public override void Start()
    {
        QueueProcessor p = new QueueProcessor ();

        while (true)
        {            
            try {
                if (p.Process () == false) {
                    Thread.Sleep(DEFAULT_SLEEP_TIME);
                }

                // Updating the date time only if an error is not occur
                lock (dateTimeLock)
                {
                    lastThreadTest = DateTime.Now;
                }            

            } catch (Exception ex) {
                RoleManager.WriteToLog("Error",
                    "Error occurred, trying again"
                    + ex.ToString());
                p = new QueueProcessor ();
            }
        }
    }

We might want to adjust the catch block to perform a sleep, just so we don't spin at 100% cpu in the event of repeated errors - this could even be on a back off algorithm, so the first couple of retry's have no sleep, then 100ms,500ms, ... Just so that longer lasting errors (network?) won't keep us to busy throwing exceptions. The easiest way for doing a back of is to declare an array of sleep times:

    private const int [] SLEEP_TIMES = { 0, 0, 100, 500, 1000, 2000, 5000, 10000 };

and have a local variable keep track of the current sleep time index:

        while (true)
        {            
            try {
                if (p.Process () == false) {
                    Thread.Sleep(DEFAULT_SLEEP_TIME);
                }

                // Updating the date time only if an error is not occur
                lock (dateTimeLock)
                {
                    lastThreadTest = DateTime.Now;
                }            

                // reset the sleep time
                sleepTimeIndex = 0;

            } catch (Exception ex) {
                RoleManager.WriteToLog("Error",
                    "Error occured, trying again"
                    + ex.ToString());
                p = new QueueProcessor ();
                // Sleep using our back off
                sleepTimeIndex = Math.Min (sleepTimeIndex++, SLEEP_TIMES.Length - 1);
                Thread.Sleep (SLEEP_TIMES [sleepTimeIndex]);
            }
        }

We can now survive errors with a more appropriate response and only rely on a restart in the event of repeated errors.

Neil

Last week, for a Azure POC, we implemented something similar to the pattern shown in Part 1. One revision, that I asked to be made, was to surround the DateTime access code with a lock statement; I was worried that updating a DateTime struct would not be thread safe - i.e the thread querying the lastHeathyReport would be reading its value while the main thread was part way through modifying it. This could lead to the health check comparing against an incorrect or invalid DateTime and unnecessarily mark up the role as unhealthy. The code to perform the locking would look like:

    // Declaring the date time and our lock
    private DateTime lastThreadTest;

    private Object dateTimeLock = new Object();
    public override void Start()
    {
        // This is a sample worker implementation. Replace with your logic.
        RoleManager.WriteToLog("Information", "Worker Process entry point called");

        while (true)
        {
            // Updating the date time in the main loop
            lock (dateTimeLock)
            {
                lastThreadTest = DateTime.Now;
            }
            // Do some work
        }
    }
    public override RoleStatus GetHealthStatus()
    {
        // Querying the date time
        lock (dateTimeLock)
        {
            if (lastHealthyReport.AddSeconds(30) < DateTime.Now)
            {
                return RoleStatus.Unhealthy;
            }
            return RoleStatus.Healthy;
        }
    }

Having researched whether the lock is need, I can categorically state that the answer is ... "it depends". The DateTime struct in the .Net framework uses an unsigned 64 bit integer to store its value. Assignment of a 64 value on a 32 bit machine is not atomic - it takes 2 machine instructions to write 64 bits, however on a 64 bit machine it is atomic. As windows Azure instances are 64 bit machines, the question of whether to lock is probably going to come down to what your dev machine is. However, taking out an uncontested lock in .Net is a very inexpensive operation, so I think, my recommendation would be to put the lock in.

If you are interested, the specs of an Azure machine for the Tech Preview are as follows - this is the virtual machine, not the actual physical servers!

  • Platform: 64-bit Windows Server 2008
  • CPU: 1.5-1.7 GHz x64 equivalent
  • Memory: 1.7 GB
  • Network: 100 Mbps
  • Transient local storage: 250 GB

And for more information on thread safety and torn reads, see Joe Duffy's posting.

Neil

Building a decoupled, queue based system is will give you the ability to scale and the opportunity to create a highly available application. By dispatching work to multiple back end worker roles we are building a system that can survive unfortunate events like bugs, exceptions, hardware failure, fire, flood, pestilence and the other horsemen of the developer apocalypse - ok, maybe I'm getting carried away and our application won't need to survive all that, but you get the idea. If one of our worker roles dies; another can take its place; Due to the reliable nature of the queues, anything it was working on will be reprocessed by another node and all is well.

Azure includes a mechanism where it can check on the heath of your nodes. ASP.Net already has this functionality built into it, but for a worker role we need to role this ourselves. All worker roles derive from RoleEntryPoint which defines a method that Azure will call periodically to determine if your role is working. The default implementation looks like:

    public override RoleStatus GetHealthStatus()
    {
        // This is a sample worker implementation. Replace with your logic.
        return RoleStatus.Healthy;
    }

That's all well and good if our app is perfect and bug free, but in reality we are going to want to make this a little bit smarter. There are a variety of things that we can return from this function, but for the most part all we will be interested in is Healthy or Unhealthy. A sensible implementation would be to get our main worker process to periodically update a timestamp to show that it is working correctly and to check this timestamp in GetHealthStatus. So lets build ourselves a main processing loop for a worker role.

    while (true)
    {
        // Report in
        lastHealthyReport = DateTime.Now;

        // Get message next message to work on
        Message msg = queue.GetMessage();

        if (msg != null)
        {
            // process message
            // ...

        }
        else
        {
            // no messages waiting, so sleep for a while
            Thread.Sleep(DEFAULT_SLEEP_TIME);
        }
    }

There are a couple of things going on here:

  1. At the start of each loop we record the current time in a DateTime member variable. This is effectively a record of the last time the main loop was known to be in a healthy state.
  2. If there are no messages waiting on the queue, we sleep - this ensures that we aren't constantly burning up our CPU looking for work when there isn't any. This reduces the power consumption of our app and will presumably reduce the cost of your azure deployment - I say presumably because any costing details are yet to be announced.

Now we come to write our GetHealthStatus status all we have to do is query the lastHealthyReport to ensure that the main thread reported in within an acceptable time.

    public override RoleStatus GetHealthStatus()
    {
        if (lastHealthyReport.AddSeconds(60) < DateTime.Now)
        {
            RoleManager.WriteToLog("Error", 
                "Node going unhealthy, not heard from main loop since " 
                + lastHealthyReport.ToUniversalTime().ToString());
            return RoleStatus.Unhealthy;
        }
        return RoleStatus.Healthy;
    }

The above code assumes that each message should take no longer that 60 seconds to process. If it does take longer than that it will be reporting that the node is unhealthy; allowing Azure to tear down the node and build us another one. Oh, and logging the error in Universal Time (or Greenwich Mean Time if you are British ;-) means that it will be easier to look in any other log files to see what was happening at about the time the node went down.

Neil.

If you are developing a queue based system in Windows Azure - and lets face it, if you want a highly scalable and reliable application, you going to be using queue - you are going to have to deal with poison messages. A poison message is a message that your application logic can't deal with. For example, lets assume that we have just placed a message on a queue that contains some incorrectly formed data - hey these things happen in even the best designed and tested apps. When we come to read the message, the bad data causes our message parser to throw an exception and the message processor will die - hence the name poison message.

In this situation, life actually gets a little worse in Windows Azure. Azure queues ensure that a message will be processed at least once. So, after our visibility timeout expires, the poison message will come back to life and get picked up by another of our processors - causing this to fall over as well. Eventually, with enough poison messages in the queue all our processing nodes will only ever get poison messages, fall over, restart, ... and we will stop doing any real work.

Currently we can't find out how many times a message has been read of a queue, so the only way to check for a poison message is to see what time it was placed on the queue. If the message has been on the queue for 30 mins and your visibility timeout is 60 seconds, it is probably poisonous. Ok, so this doesn't allow for periods of down time longer than 30 mins, but it is moving us in the right direction.

What to do with a poison message is probably a trickier issue. A generic solution would be to write the message to a poison message table for later manual inspection.

Code to do this is going to look like:

    // Get message with 60 sec visibility
    Message msg = queue.GetMessage(60);

    // Poison check
    if (msg.InsertionTime.AddMinutes(30) < DateTime.Now)
    {
      // Treat as poisonous - but don't look at in the message!
      byte [] msgBody = msg.ContentAsBytes();
      PoisonMessageTable.Insert(msgBody);
      
      // Stop it repeating on us
      queue.DeleteMessage (msg);
    } else {
      // process normally
      // ...
    }

If you aren't interested in seeing the poison message, you can achieve this more easily by setting the message expiration time to 30 mins. This means that Azure will delete the message after it has been on the queue for 30 min; you have kept your system up and running, but you have lost what ever was in the message and the opportunity of fixing the bug that is causing the problem.

Neil

More Posts Next page »
 
Page view tracker