Saw an interesting problem today which I thought was worth sharing.

A customer was trying to send some large messages from one machine to another but, even though the destination had not yet reached its storage quota and there should have been enough capacity, nothing was getting delivered. The visible symptom was an outgoing queue with a status of "waiting to connect" whilst other outgoing queues to the same destination machine were working fine.

The root cause lay in the way MSMQ stores messages. In the Storage directory (under windows\system32\msmq) will be a number of 4MB files that will contain the messages. Each file can contain as many messages as will fit which works quite well to start with. If a message cannot fit into one storage file, though, because it is too big for the remaining free space then a fresh file is created. There can only be as many storage files as would fit within the storage quota and when this limit is reached then no more storage files can be created. For example, a 2GB quota could accommodate 500 4MB storage files.

This does not mean that it will store 2GB of messages. In an MSMQ system where messages come in a range of sizes, there may be a significant amount of free space but it is scattered throughout a few hundred storage files. This becomes more apparant over time as messages are removed and replaced, leading to fregmentation of free space. In the example where there are 500 storage files, if none of them has a large enough vacancy for an incoming message then delivery is rejected because the quota has been reached and no new storage files can be created. The sending machine will continue to send the message until the receiver has processed some messages and freed up enough space to stop rejecting new deliveries.

Note that MSMQ storage works differently from file storage - there is no linked list stitching together parts of a message. An MSMQ storage file has to have inside it a contiguous block of free space that is large enough to accommodate the whole message. MSMQ messages cannot span storage files either - if the message is greater than 4MB then MSMQ cannot store it.

This problem is made worse because MSMQ uses a pipeline for sending messages. If there is a large message at the front of the queue then it will block all the (possibly smaller) messages behind it until delivery becomes possible.

What can you do about this? Here's a few ideas - there are bound to be more:

  1. Wait it out - storage space should become available eventually.
  2. Purge the outgoing queue and resend the smaller messages first
  3. Set a Time To Reach Queue on the messages - the large message will expire and be removed from the outgoing queue, allowing the smaller messages through.