ZDNet and GoodGearGuide both report that Rustock is responsible for 41% of the world’s botnet spam in August, up from 32% in April.  They are both quoting MessageLabs’s latest Intelligence Report.

Rustock is, of course, the largest botnet out there but it depends on how you count it, as I have iterated in the past.  If you count by number of unique IPs, then it is the largest botnet by a large margin.  If you count by the number of email envelopes, it is still the largest by a large margin.  However, each email envelope can have multiple recipients (receivers on the RCPT TO).  If you count the each recipient as 1 message, then Rustock is the second largest botnet, trailing Lethic by a large margin.  This is because Lethic sends 5-6 times as many messages per connection as Rustock.

You might be wondering why we would want to count total messages instead of total envelopes.  Don’t you want to reject a message as soon as you possibly can?

The answer is it depends.  Or rather, you want to reject a message as soon as you can, but no sooner.  In our service, we reject messages after the RCPT TO, not on connect.  The reason we do this is because we are a hosted service and we have reporting requirements for each of our customers.  If our customer is microsoft.com and they want to know how many messages we blocked for them, then the only way for us to tell is to count the number of recipients on the RCPT TO.  We add up how many are going to @microsoft.com and then log that number.  This means that we cannot reject at connection time.  If we did that, we would have a log of the sending IP but not the MAIL FROM (which isn’t relevant) and not the RCPT TO.  It would make it impossible for us to validate our SLAs and tell customers how much mail we blocked for them.

While reporting has its advantages, this also has its drawbacks.  By holding the connection open longer, we cannot reject as quickly and tie up more resources.  On the other hand, it’s not that big a deal because we are rejecting after the RCPT TO, not on end-of-data would would hold up significantly more bandwidth.  From a tech standpoint, we are imposing a cost upon ourselves but it is one we pay in order to demonstrate our net worth to the end user.

The differentiation between messages vs envelopes matters for a second reason: botnets like Lethic, when they aren’t on a blocklist, cost us way more resources when it comes to content filtering.  All spam that comes to us first needs to get past the IP blocklist.  If it does, then it’s on to the content filter.  As I said earlier, Lethic sends lots of mail per connection.  They are like the guy who goes to the all-you-can-eat-salad-bar 5 times and pays $8.99.  When we filter it in the content filter and mark it as spam, at the end we have to split up all of those RCPT TOs and send them either to each user’s quarantine (in the cloud where we have to store it) or to the customer’s mail server where they sort it and store it on-premise (such as an Exchange box’s junk mail folder).  When that occurs, that takes up a lot of resources in terms of bandwidth and disk storage cost.  The post-blocklist cost of Lethic is higher than Rustock because of the way they send their spam.  For a filtering service, that matters and it matters a lot.

So yes, Rustock sends the most spam but it depends on how you measure it.  It also depends on what you consider to be the greater impact.  Not only that, but if we’re talking about bandwidth and storage costs, then it’s not just about the number of messages but how big each message in in terms of kilobytes.  My research indicates that from March – June 2010, the average size of a Rustock spam message was 18kb, whereas Lethic was 3kb.  So, that kind of equalizes the amount of spam cost each one takes up.  Using these counts, then Rustock becomes the biggest botnet by a longshot, once again regaining its crown even in spite of Lethic’s dominating messages/envelope.  The total value of Rustock?  40%, which agrees with MessageLabs’s numbers.

Aren’t statistics great?