Sunday, December 07, 2008 6:33 PM
neilk
Poison in Windows Azure
If you are developing a queue based system in Windows Azure - and lets face it, if you want a highly scalable and reliable application, you going to be using queue - you are going to have to deal with poison messages. A poison message is a message that your application logic can't deal with. For example, lets assume that we have just placed a message on a queue that contains some incorrectly formed data - hey these things happen in even the best designed and tested apps. When we come to read the message, the bad data causes our message parser to throw an exception and the message processor will die - hence the name poison message.
In this situation, life actually gets a little worse in Windows Azure. Azure queues ensure that a message will be processed at least once. So, after our visibility timeout expires, the poison message will come back to life and get picked up by another of our processors - causing this to fall over as well. Eventually, with enough poison messages in the queue all our processing nodes will only ever get poison messages, fall over, restart, ... and we will stop doing any real work.
Currently we can't find out how many times a message has been read of a queue, so the only way to check for a poison message is to see what time it was placed on the queue. If the message has been on the queue for 30 mins and your visibility timeout is 60 seconds, it is probably poisonous. Ok, so this doesn't allow for periods of down time longer than 30 mins, but it is moving us in the right direction.
What to do with a poison message is probably a trickier issue. A generic solution would be to write the message to a poison message table for later manual inspection.
Code to do this is going to look like:
// Get message with 60 sec visibility
Message msg = queue.GetMessage(60);
// Poison check
if (msg.InsertionTime.AddMinutes(30) < DateTime.Now)
{
// Treat as poisonous - but don't look at in the message!
byte [] msgBody = msg.ContentAsBytes();
PoisonMessageTable.Insert(msgBody);
// Stop it repeating on us
queue.DeleteMessage (msg);
} else {
// process normally
// ...
}
If you aren't interested in seeing the poison message, you can achieve this more easily by setting the message expiration time to 30 mins. This means that Azure will delete the message after it has been on the queue for 30 min; you have kept your system up and running, but you have lost what ever was in the message and the opportunity of fixing the bug that is causing the problem.
Neil