Welcome to MSDN Blogs Sign in | Join | Help

Terry Zink's Anti-spam Blog

Protecting your mail from the scum of the internet
Operating system security vulnerabilities

A few weeks ago, Microsoft released its 2008 Security and Intelligence Report.  In it, they detail a number of interesting trends.  One is how much the Malicious Software Removal Tool removes per operating system infection.

image

The infection rate for Windows Vista is significantly lower than that of its predecessor,
Windows XP, in all configurations. Specifically:

  • Comparing the latest service packs for each version, the infection rate of
    Windows Vista SP1 is 48.8 percent less than that of Windows XP SP3.

  • Comparing the n-1 service packs for each version, the infection rate of the release
    to manufacturing (RTM) version of Windows Vista is 56.2 percent less than that
    of Windows XP SP2.

  • Comparing the RTM versions of these operating systems, the infection rate of the
    RTM version of Windows Vista is 85.4 percent less than that of the RTM version
    of Windows XP.

The higher the service pack level, the lower the rate of infection. This trend can be
observed consistently across client and server operating systems. There are two reasons
for this:

  • Service packs include fixes for all security vulnerabilities fixed in security updates
    at the time of issue. They can also include additional security features, mitigations,
    or changes to default settings to protect users.

  • Users who install service packs generally maintain their computers better than
    users who do not install service packs and therefore may also be more cautious in
    the way they browse the Internet, open attachments, and engage in other activities
    that can open computers to attack.

Dare I say that if users upgrade their operating systems, we'd see fewer botnets?  Maybe, maybe not.  But it seems to make sense.

Spamhaus lists Microsoft as a spam-friendly ISP - update

A week ago, the Washington Post printed an article saying that Spamhaus had listed Microsoft as the 5th worst spam-friendly ISP.  There was (an is) a link to the current top 10 worst spam friendly ISPs, and while Microsoft is no longer on there, the point has been made.

Spammers have been abusing Microsoft's free web services for a long time, using a technique called Reputation Hijacking.  As I have posted before on this blog, botnets are used to sign up for free Hotmail (Windows Live Mail) accounts, create landing pages on Windows Live Spaces, and use the storage resources of Windows Sky Drive.  So, Microsoft is not really an ISP, they merely have a lot of free services to provide to Internet users.

Until recently, Yahoo was in that list as well.  It illustrates the problem that the Big 4 (MAGY) have, and that is trying to provide rich content tools to end-users while simultaneously trying to avoid the problem of abuse.  A friend of mine in Photosynth has told me the same problems - 12 hours after it went live, they had illegal x-rated images on there.

When I first saw the Spamhaus link, my heart skipped a beat a little.  For you see, a couple of months ago I saw that our services were responsible for emitting a whole pile of MAGY spam.  I was watching a presentation and I did a rough calculation in my head for how much spam we were sending out.  It was a lot.  To gauge my reaction, think of the Simpsons episode when Shelbyville steals Springfield's lemon tree.  Now, Homer and Bart, et al, are in Flander's RV in the impound lot.  Bart is running back to the RV being chased by a dog.

Homer swings open the door and says "Don't worry boy!"  He then tosses out a bunch of sausages to distract the dog, but the dog swallows them whole and doesn't miss a step.  Homer gasps a little bit and his eyes go wide.  Homer's reaction was the same as mine.

Since that fateful day when I learned about our outbound spam problem, we have clamped down quite a bit and have way better monitoring now (and still improving).  Thus, when I read that Microsoft has listed as a spam-friendly ISP, I was secretly very uneasy.  I clicked on the link and read through the offenders, and thankfully Exchange Hosted Services was not listed.  That was a relief, it demonstrated that maybe our monitoring was working.

As I said, Microsoft is no longer listed on the Top 10 list.  I do know that Microsoft shuts these things down as soon as they find them, and as time passes there will be more and more defenses in place to mitigate this kind of abuse.  However, it is actually resource-intensive and it takes a while to build a solution that actually scales to the level that Microsoft needs it to.

Trends from 2008

I have commented that one of the major trends that I have seen this year is a steady decline in the amount of spam that we see compared to 2007.  This was certainly accelerated after McColo was taken offline, but that was also true even before that.

However, while spam has been down by at least 50% (at least for us), the amount of viruses that we have seen this year has increased by substantially by at least a factor of 5.  I don't have the numbers in front of me or know them off the top of my head, but I wouldn't be surprised if it was more than that.

Do I have some theories about why we are seeing so many viruses this year?  Well, it all comes back to a post I made this year about spam bots diversifying:

  1. Spammers want to see their botnets so they send out spam with links to malware, or they send messages with malware attached.

  2. Recipient opens message with malware (or clicks on link and tries to view Paris Hilton on her newest video... or something) and gets their PC infected.

  3. Spammers use botnet in different ways, they are not just for sending spam anymore.  The bots are used for reputation hijacking.  Whereas before they sent spam, now they build landing pages on Windows Live and Google's Blogspot.  They also break into Hotmail, Yahoo Mail and Gmail and create bogus accounts for which to send spam.

  4. This part is pure speculation on my part.  More spam is emitted from MAGY to MAGY (Microsoft, AOL, Gmail and Yahoo).  It could be that our customer base does not fit the target profile for these spammer's recipients.  Indeed, we don't content filter mail any differently if email comes from MAGY.  Note I said content filter, as opposed to reputation filter.

Now that McColo is taken offline, there seems to be a consensus in the spam community that spammers need to rebuild their botnet so they will be sending out piles of viruses.  That may be true, but we were seeing piles of viruses even before McColo went down.  As for me, I'm not so sure.  My guess is that McColo going down is going to be more inconvenient to the spammers than we probably think and it will take them longer to rebuild their infrastructure than we pessimistically assume. 

Do I even bother trying to save the world?

In one of my other posts, I lamented that some of our outbound mail from customers was being sent by some people who put non-resolveable domains as the envelope sender.  As a result, an ISP (among others) was throttling our mail because the sender's domain had no A-record.

Stuff like this is difficult to take action on.  On the one hand, we could work with the ISP to get an understanding like the following: "Look, this one of your customers who is using us as their work account to forward to their home account.  Stop throttling us... and them."  We could call them up, explain the situation, work out a deal and the problem goes away.

But then it comes back up.  Because eventually somebody else starts blocking our email because of the same thing.  So, we call up those guys and cut a similar deal.  And the problem goes away... for a while.  And then the same thing happens again and again.  In other words, no matter how many times we work out a deal with someone, another case arises.

It's not just outbound mail.  Many customers or customer senders have broken SPF records.  Do we reach them all and try to fix those, too?  What about the ones who have broken HTML links, broken senders, broken mailers that look like spammers, and so forth?  My point is that no matter how many broken things we fix, there will always be more.  I'm not sure that it's worthwhile saving the world because it's an uphill battle.

At least, not compared to the alternative.  Saving the world is one thing, but what if we accepted that people have broken mailers and just live with it?  We don't have to score SPF records super-high in the content filter, nor auto-reject on broken headers, nor reject mail on no A-records in the sender's domain, and so forth.  In other words, we can mitigate almost everything by being a little conservative in our spam scoring engine while allowing most of the legitimate mail to get through.  While it is true that we are not getting people to fix stuff that's broken, at the same time, we are causing fewer headaches for ourselves.

Saving the world is one thing, but it's very time consuming.  And, we have other fish to fry.

CBL's take on McColo being taken offline

It's been over a week since McColo's operations had its plug pulled, and our spam volumes are still way down (I still haven't figured out a way to take credit for that).  On average, it is down by around 40-50%.  The last couple of days have seen some slight upticks but not a lot.

One of the stats that has surprised me was our inbound spam/non-spam traffic.  It's generally about 65/35 spam/non-spam.  However, on November 12, for the first time ever, the non-spam part ticked above 50% for the first time ever (ie, since I've been keeping track of the stats).  Our mail servers were actually being used mostly for legitimate mail.  I think that's a Christmas miracle.

Anyhow, the CBL has posted an article about the McColo take down.  In case you don't know, CBL stands for Composite Blacklist and they collect information about zombies around the Internet that are sending spam. They first posted it this past Monday, on November 17.  Here are some excerpts that I think are worth quoting:

On the eve of the McColo disconnection, "named BOT" detections represented about half of the total IP addresses listed by the CBL. At that time, we measured that the named BOTs were responsible for about 68% of all of the spam the CBL detects.

The "named BOTs" are the BOTNETs that most researchers talk about, such as Srizbi, Cutwail/Pushdo, Ozdok/Mega-D, Bobax/Kraken, Rustock, Asprox, Storm, Warezov and others. Srizbi was by far the largest, running around 35% of all spam that's caught in our spam traps. Cutwail second (at around 18%), most of the others in the 5-10% range.

The following major BOTNETs showed immediate effects when McColo was disconnected: Srizbi, Rustock, Asprox, Bobax, and Ozdok/Mega-D by a sudden precipitous drop in CBL detections.

Ozdok/Mega-D went virtually silent within an hour. Bobax had a big chunk (about half) taken out of it within a few hours. Srizbi, Rustock and Asprox dropped off by more than 95% of normal levels within hours. Eg: Srizbi dropped from 170,000-190,000 detections per day to about 3500. Cutwail/Pushdo lost about 15% over the first 24 hours of McColo outage.

This represents an incredible drop in traffic.  McColo really was responsible for sending out piles and piles of spam.  It makes you wonder why these guys weren't cut off in the first place.

Far be it from me to spread rumour but a colleague of mine asked and answered that same question.  This is completely unsubstantiated and I haven't Google'd or Live Searched this, but I guess the owner of McColo had ties to Russian organized crime.  He was some young kid (late teens or early twenties) who died in a car crash in Moscow either this year or last year.  If, indeed, parts of this story are true, then I guess it would be difficult for US law enforcement officials to track down these guys in a foreign country.  I suppose it would be up to Interpol to do that.

Yet ironically, it was not law enforcement that took these guys down, it was a policy decision by the people who owned the network hardware.  Sometimes all it takes is the political will to shut down a paying (?) customer.

BTW, anyone know if the story I related above is true?

The antispam accuracy of sender verification

Three simple techniques that are used as inputs for filtering spam are the following:

  1. Check to see if the sending domain in the SMTP MAIL FROM has an MX record
  2. Check to see if the sending domain in the SMTP MAIL FROM has an A-record
  3. Check to see if the sending IP has a reverse DNS

The point of the first two is see if the sending domain exists.  Spammers don't care about receiving answers to their messages (except in the case of 419 spam) so the theory is that if a sender does not have a domain that exists, it is probably a spammer.  In the third case, spammers will often hijack IPs with no reverse DNS so as to avoid reputation filters, so no reverse DNS = suspicious.

Customers have often requested why we do not have outright blocks on mail that meet any of these criteria.  My answer is always the same: these techniques are not reliable enough upon which to block mail.

There are plenty of examples I can name where someone might legitimately do this.  People sometimes misconfigure mail servers.  People send automated reports.  Companies that are small might not know enough to set up their reverse DNS, and so forth.  It doesn't matter how many people you correct to fix something, there will always be more.  Rather than attempting to save the world by fixing everyone else's settings, my philosophy is to avoid being overzealous in spam filtering.  In other words, I acknowledge that people out there do silly things, and I avoid being overly harsh when I encounter them.  The FP headaches are not worth the hassle.

To support this assertion that the above three techniques are not enough to block on, I revert to statistics.  Prior to the McColo outage, about 64% of all mail that hits our inbound filters (after IP rejects, which accounts for the bulk of all total mail) is marked as spam.  Here are the numbers for each of the above rules:

  1. No sending domain MX record - 17% spam rate
  2. No sending domain A-record - 16% spam rate
  3. No reverse DNS of sending IP - 29% spam rate

Spam rate means "When this rule hits a message, what percentage of the time do we mark it as spam?"  To interpret this, if spammers exclusively used a technique, we should see a higher spam rate.  For (2), we should see a 90-95% spam rate (the rest being false negatives and tiny corner cases).  If it was evenly split between spammers and misconfigured users, then we should see a 64/64 split, or thereabouts.

But that's decidedly not what we see.  We mark almost 2/3 of inbound mail as spam, but when this rule fires, only 16% of the time is it marked as spam.  The fact that there is a nearly 40-point spread makes this unlikely to have occurred by chance, noise, or false negatives.

This means that a very highly disproportionate amount of legitimate mail sends with no A-record for the sending domain.  The conclusion?  Blocking mail from senders with no A-record will be prone to false positives.  The situation will be the same for the other techniques.

Even throttling on this technique is prone to false positives.  Throttling on misconfiguration is almost as big a problem as blocking on it.  If one user screws up and sends mail with no A-record, they're probably going to send a lot of mail.  Worse, if they script it, they're probably going to send a ton of it.  So, simply because a user has sent a lot of mail with no A-record, it doesn't mean they are spamming.  More analysis is required, like seeing if the domains are all different and who they are sending to.  Simple blocks on these three techniques is a bad idea.

Categories of problems in outbound spam

Being a hosted service, we have a number of customers who share an outbound IP range.  If one of those customers starts to misbehave, their actions can affect everyone else.

We've lot about outbound spam this past year.  We've implemented a number of solutions and incrementally have started to tighten the screws in what we will allow customers to send out without any interference from our side.  We have discovered that the following about outbound spam from customers:

  1. The techniques used for inbound filtering don't carry over quite as well for outbound mail scanning.  The false positives are higher for outbound than they are for inbound.  This remains a puzzle.

  2. The spam problem is in reverse; for inbound mail, it's mostly spam with some legit mail.  For outbound, it's mostly legit mail with some spam.  Detecting spikes in mail doesn't work very well because the day-to-day data is so noisy, a blip in traffic from one customer doesn't stand out in the overall scheme of things.

Going from the above, we've had to deconstruct the problem down into a series of smaller problems.  In roughly the following order of difficulty, here are the scenarios when dealing with outbound "spammers":

  1. Outbound spam that we detect from spoofed senders 

    When a customer sends spam from a domain that we don't know about (ie, *@yahoo.com, *@paypal.com, etc), we catch and handle this case.  It is permissible for customers to send mail as sending domains that they have not registered with us but we will treat that mail differently if they do and we detect it as spam.

  2. Outbound spam that we miss, from spoofed senders

    This is somewhat similar to the above, except that our normal filters miss the message and don't detect it as spam.  This doesn't occur often, but it happens enough to be a nuisance.  To that end, we decided to treat this mail differently as well and apply some heuristics to it.  Borderline mail gets nudged over the spam threshold if it's outbound and the sender isn't registered.  We don't block it, but we do detect it and treat it differently.

  3. Outbound spam that we detect from good senders

    Originally, we thought we'd give our customers a break.  If you are sending mail from a domain that is registered with us, we'll treat you well.  You're doing something you are supposed to do - send mail from presumably locked down accounts.  Well, as it turns out, it only takes one bad apple to ruin it for everyone in that domain.  Users get their accounts compromised all the time (and it's a different user each week).  So, outbound spam from supposedly well-behaved domains is a third case that must be handled.

  4. Outbound mail that isn't spam but is still getting us blacklisted

    This is the most difficult case.  When users do something that is legal according to SMTP but considered bad practice in the real world, that's a problem.  We recently had a user send out a bunch of mail using a domain that had no A-record (ie, test@example.domain.local).  It looked like an admin or programmer or something had an automated report sending a whole pile of mail to his home ISP account (hmm, how many of us have done that?).  Well, guess what?  That ISP detected that sending domain didn't exist and throttled our outbound IPs.  Sending without an A-record isn't illegal, but it is bad form. 

    Our filters did not say that the message was spam (and it wasn't).  But someone else's filters said that doing stuff like that is enough to block our IPs.  It ended up hurting us because our filters didn't detect it, and that's the case that, in my opinion, is the most difficult one to solve.  I avoid FPs like the plague, and I think that this was a case of an FP.

We started off with a liberal implementation of outbound spam filtering.  Over time we have slowly and incrementally started clamping down even more and I suspect that we will get to the point where we are very conservative in what we send out.  I don't particularly like that approach but I guess that's the reality of where it's headed.

Some cool techniques for image filtering

In 2006, spammers started in a big way to use image spam to try to push through all of their stuff.  While this technique is still used today, it isn't quite as effective because spam filters caught up.

One technique that Microsoft developed is called Shingling.  That's where the image is broken up into a series of smaller segments, called shingles.  The noise is removed and hashes compared for those microsegments.  Given two images, it was possible to compare if two images were more or less the same.  Of course, they weren't exactly the same, they were slightly different but all spammers were doing was inserting random noise, or rotating the image or phase shifting it.  By ignoring the noise one could compare and match two images.

Recently, I came across the Photosynth application from Microsoft Live Labs.  This is an application where you can upload your pictures from a trip and it will attempt to create a panoramic shot of all the images.  I would guess that some of the image shingling techniques are used by looking at things like edge detection.  While I was in China, I knew about this application so I took a few pictures to test this out.  I didn't completely succeed in getting all of the overlap to work, but some of it turned out all right.  Below is a shot of Shanghai, China:

Anyhow, it's a cool application.  From now on, for my future trips I shall take shots of some neat places with this app in mind.

Major spam operation goes offline, spam plummets

This has been picked up by a couple of other blogs (I'm almost never the first to report on these things) but I'm going to talk about it anyway.  The Washington Post reports that a Web hosting company out of San Jose that hosts spamming organizations was taken offline.  Some excerpts from the article:

Experts say the precipitous drop-off in spam comes from Internet providers unplugging McColo Corp., a hosting provider in Northern California that was the home base for machines responsible for coordinating the sending of roughly 75 percent of all spam each day.

In other words, McColo Corporation controls a whole pile of spam and they were cut off.  Continuing on in the article:

In an alert sent out Wednesday morning, e-mail security firm IronPort said:

In the afternoon of Tuesday 11/11, IronPort saw a drop of almost 2/3 of overall spam volume, correlating with a drop in IronPort's SenderBase queries. While we investigated what we thought might be a technical problem, a major spam network, McColo Corp., was shutdown, as reported by The Washington Post on Tuesday evening.

That is a huge decline in mail.  Our own numbers confirm this.  I checked some numbers really quickly and did the math and plotted the charts.  The amount of mail we saw hitting our inbound servers dropped by 40% in one day.  That's very unusual for a Tuesday.  However, the amount of mail we deliver to end users pretty much stayed the same.

While Ironport does expect the spam levels to return to normal (McColo will probably just plug their servers back in or move buildings), let the record show that I am going to partake in a little bit of schaudenfreude at McColo's expense.

Diagnosing a spam run

The other day, we discovered one of our customers had been compromised and was relaying outbound spam through us.  The spammer was clever in this case and was using some fake headers to attempt to trick the recipient, whoever they were, about the source of the mail. 

Here's the mechanism I use to discover that the message headers were fraudulent.  I have modified some of the headers to protect the identity of the customer who sent out the mail, as well as some of the unnecessary headers.

Received: by mail62-sin (MessageSwitch) id 1226335206884132_19415; Mon, 10 Nov 2008 16:40:06 +0000 (UCT)
Received: from webmail.example.co.fr (dsl-237-105-212-81.yoga.co.fr [287.105.212.81]) by mail62-sin.bigfish.com (Postfix) with ESMTP id E247126804E;   Mon, 10 Nov 2008 16:40:05 +0000 (UTC)
Received: from 237.105.212.81 ([125.110.123.13]) by
webmail.yoga.co.fr with Microsoft SMTPSVC(6.0.3790.3959);       Sun, 9 Nov 2008 12:26:25 +0000
Received: from u32.yahoo.com (u32.yahoo.com [131.128.46.80]) by  with SMTP; Thu, 13 Nov 2008 16:21:19 +0400
Message-ID: <ydiwpeunukojmeikxitdxk.0171156498611385053547065@yahoo.com>
Date: Thu, 13 Nov 2008 16:19:19 +0400
From: "?hRoss" <fzqvkbtkblxq@yahoo.com>
Reply-To: "?hLivingston" <fzqvkbtkblxq@yahoo.com>
To: <munged@mxic.com.tw>
Subject: ????i?S?k?D???????S???
MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="--NextPart_qpb_4ppq_3a78p9pii5twwf9a"
X-OriginalArrivalTime: 09 Nov 2008 12:26:26.0648 (UTC) FILETIME=[62AEA580:01C94266]
Return-Path:
qszdvjosgxxe@yahoo.com

This is some Chinese porn spam.  Click here to view it!!!

Much of this data is not all that useful.  I have created the contents of the message, it was actually a bunch of non-sensical text in Quoted Printable, but the essence of the message is that this particular message was porn spam claiming to be from Yahoo.  Let's deconstruct it.

Our servers received a message from one of our outbound customers:

from webmail.dhalsim.co.fr (dsl-237-105-212-81.yoga.co.fr [287.105.212.81]) by mail62-sin.bigfish.com (Postfix) with ESMTP id E247126804E;   Mon, 10 Nov 2008 16:40:05 +0000 (UTC)
  • The IP that connected to us is 287.105.212.81

  • The reverse DNS of this IP is dsl-237.105.212.81.yoga.co.fr.  This is a DSL pool in France, meaning that our customer is using a French ISP to connect to us.

  • The machine HELO'd to us as webmail.dhalsim.co.fr.  Just by looking at this HELO and the reverse DNS, I'd guess that it was a small business that teaches yoga.  They don't have a dedicated IP so their ISP provides them with an IP as the way of connecting to the web and sending out mail. They use that IP to connect to us and relay their mail.

Right off the bat, I pretty much know that this computer is part of a botnet.  Why do I suspect this?  Well, I stripped out some headers that we use to tag this message as outbound spam.  That's my first clue.  The second is that this IP uses a DSL pool.  Large pools of non-dedicated IPs are generally prime candidates for zombie botnets.  This means that DSL and cable pools, and to a lesser extent dial-up pools, are commonly compromised.  I've seen this before and this fits the pattern.  Putting these together allows me to diagnose the problem.

The next two headers allow me to figure out who is being spoofed:

Received: from u32.yahoo.com (u32.yahoo.com [131.128.46.80]) by  with SMTP; Thu, 13 Nov 2008 16:21:19 +0400

Look at the above header.  It claims to come from yahoo.com.  Properly read, the IP 131.128.46.80 has a reverse DNS of u32.yahoo.com.  It HELO'd as u32.yahoo.com.  The next header says:

Message-ID: <ydiwpeunukojmeikxitdxk.0171156498611385053547065@yahoo.com>

The Message-ID has a yahoo.com email address in it.  So, in effect, these headers say that the message originated from yahoo (check out the From and Reply-To addresses above and the fact it has an @yahoo.com in the Message-ID) and is going to a recipient in Taiwan.  The fact that the message is encoded in Quoted Printable leads me down the path of Chinese porn.  This spammer is targeting a specific country, that is, Chinese spam going to a "Chinese" recipient.

However, those headers are fake.  Here's how we can tell:

  • The reverse DNS of 131.18.46.80 is not u32.yahoo.com.  That IP is not part of any Yahoo subnet and in fact has no reverse DNS. 

  • The domain u32.yahoo.com does not have an A-record.

An IP that claims to be from Yahoo but which does not have a Yahoo forward or reverse DNS is undoubtedly fake.  Yahoo is simply not that sloppy. 

Next, consider the sequence of events; this header says that Yahoo Mail generated an email, connected to wemail.yoga.co.fr and relayed the message from there.  That doesn't make sense.  Why?  Because Yahoo sends out email, it doesn't connect to a secondary web mail server and send out mail a second time; there's one too many Received headers in there for that to make sense.  And in the unlikely event that it did do just that, webmail.yoga.co.fr would rewrite the Message-ID to something that did not contain the @yahoo.com in it.

Thus, what we have here is an example of a spammer attempting to mask where his spam came from.  He faked the headers to make it look like it came from a Yahoo Mail source, but in fact, it came from a compromised host in the DSL pool used by Dhalsim's Yoga Factory.  Dhalsim's Yoga Factory is the source of this spam.

Microsoft's Security and Intelligence Report

Microsoft has recently released its Security and Intelligence Report for January - June 2008.   The report contains a lot of data from Hotmail but also from us in Exchange Hosted Services.  The full report with supporting data can be found here, and then you need to download the file named Microsoft_Security_Intelligence_Report_v5.pdf (it's a 12 MB file). 

Alternatively, you can download the summary, but I wouldn't recommend that.  The reason is that yours truly, ie, me, is a contributor to this report.  Most of it is Hotmail centric, but not everything.  Some of it applies to the enterprise, particularly those in Exchange Hosted Services.  Specifically, I provided the data on page 67.  Here is a breakdown of our spam distribution for the first half of the year categorized by type of spam:

image

Pharma-spam continues to dominate the spam landscape.  Sexually explicit spam is a minor player and that represents a reversal from when I first broke into the industry 4 years ago when it was a much larger player.

I would have expected stock spam to be a lot more dominant, but I guess the stock market woes along with the SEC clamping down more on stock scams has contributed to the decline in this type of spam.

Over the next few posts, maybe not consecutively, I will highlight and comment on some key findings from Microsoft's report.

Spam filtering and skill sets

When filtering spam from a client base that is world wide, you tend to pick up a skill set that you might not otherwise get a chance to obtain - learning foreign languages.

Now, I'm already fluent in six million forms of communication, but surprisingly there are a lot of common languages that evade me.  While filtering spam will never make me fluent in any language, I have discovered that given only a few words or sentences (sentences are way easier), I can often tell what language an email message is despite not actually being able to speak the language.

Well, fast forward to the last couple of days.  I decided I was going to update the sensitive word list in one of the languages that we support - German.  These are a group of words that customers can optionally enable.  If the custom spam filter option flags the word, it automatically gets junked.  By a series of events, the task of updating the list fell to me.  So, I went and did some research (don't ask me for details) and came up with a list of over 400 candidates. 

I had to whittle that down by getting rid of duplicates as well as the ones that simply wouldn't work.  By way of example, in English, the word slut has a negative connotation, if you speak English then you know what it means.   However, in Swedish, it means "ends, end, or finish."  Even if you obfuscate the word to look for instances of something like s'lut (as spammers often do), blocking on that doesn't work either. In French, the word salut means hi .  However, sometimes French speakers abbreviate that to s'lut .  So, the ability to think laterally, linguistically, is a real advantage.

I had learned those two pieces of trivia some time ago.  These past couple of days I learned some more.  As I was paring down the list of words we couldn't use, I started to learn them.  I had them all in an Excel spreadsheet with the English translation next to the German one.  Some German terms had multiple translations so I got rid of some of them by right-clicking, selecting Delete and picking the "Shift cells up" option.  This moves all of the cells up by one, deleting the current cell.  However, after doing this a few times, I started to look at the translation of sensitive words.

"Wait a second," I said.  "This German word does not mean what the English translation says..."  It turns out I should have been doing some other copying-and-pasting and deleting the row, rather than deleting the cell.  However, the point is that I actually learned a bunch of words in German that 48 hours earlier I did not know.  I was actually reading the German word, translating it in my head and then confirming that the English translation was incorrect. 

You see, when you're exposed to a lot of spam and have to work with foreign languages, it's actually quite amazing at how quickly you can start to pick up bits and pieces of that language.  It's a skill set that comes in handy when you want to travel abroad.

Now, I'm not sure that the vocabulary that I acquired is going to be very useful, but the point remains - fighting spam does enable you to pick up the oddest skill sets.

Why socialism is bad

About a week or two ago, I saw then-Senator Barack Obama doing an interview on The Daily Show with John Stewart.  He was responding to some criticisms from the McCain campaign that arose in regards to his comments that we should "share the wealth."  He joked that the McCain campaign was calling him as a socialist because in kindergarten he shared his toys with other kids.  I thought that was quite funny.

Now, contrary to what you might think, this blog post is not about politics.  I'm simply using it as an intriguing post title, but I'm going to come back to Obama's point about sharing toys.

In a hosted service, or indeed, in any service that has shared IP space, sharing that IP space often leads to major headaches.  For an outbound email service, many different businesses and organizations use us for outbound mail and they get routed through a single IP (or rather, a smaller subset of IPs).  If one of those customers starts doing something bad, such as getting a box infected into a botnet and spewing out spam, that one customer can ruin it for everyone.  If our one outbound IP gets listed on a 3rd party blocklist, then other customers who are sending to people who use that blocklist can get their mail bounced.  They didn't do anything to deserve it, but actions of someone sharing the IP space can hurt them.

Similarly, if I were on a dynamic cable modem pool and I were a single user hidden behind a NAT, my outbound IP is the same as all of my neighbors.  If I start doing something abusive, such as trying to hack into the Department of Defense, I can affect the access of users who are sharing that same IP space if the DOD decides to ban me, and the other web portals decide to list me and ban me as well.  All of my neighbors can't access their favorite web sites either.

The sharing (socialism) of IP space is a real headache.  We each only get so many IPs so customers have to share them.  Yet, a single customer can (and does) ruin the experience for everybody.  When we first started tracking outbound spam incidents closely, we found that we had an incident about once per day.  It's less now, maybe 2-3 per week, but the point is that left unchecked these problems will re-assert themselves time and again.

Postini's new features

Over on the Google Enterprise Blog, they recently posted the following with regards to some new features:


(1) Our spam protection continues to evolve, this time with NDR (non-delivery receipt) filter improvements. Administrators can now more precisely deal with NDR attacks which includes the ability to distinguish between legitimate and spam NDR messages and set rules that bypass the NDR filter.

(2) Customers who route their outbound mail through our datacenters greatly benefit from this next enhancement. We've observed that customers' mail servers can send volumes of junk messages, which in most cases are generated when servers are inadvertently configured as an open relay and used by spammers. This creates a number of problems, including the DNS "blacklisting" of the outbound server. Our outbound mail processing now includes spam scanning. This reduces DNS blocking issues and helps raise awareness of possible mail server security issues.

With regards to point (1), backscatter has become one of the big issues that we have had to deal with this year.  It's such a big problem that I wrote an 18-part series on it earlier this summer.  Google's blog post is a little ambiguous about what technique they are using to detect NDRs.  The best technique would be Bounce Address Tag Validation, but it doesn't look like it's doing it.  More than likely, they are probably using something like "Check to see if you sent it in the first place."  The reason I say this is because with BATV, you wouldn't need to bypass the NDR filter; I suspect that they have some global policy (or custom spam) rules that reject all mail for NDRs.  In order to use either of the techniques I mention, you have to use the hosted service for outbound mail.  If you do use it for outbound, then smart NDR blocking comes into play.

With regards to point (2), outbound spam filtering is a feature that we first started looking into a year ago.  Outbound spam has since become the bane of my existence.  I notice that Postini is simply scanning outbound mail for spam; they don't say in the blog post what they are doing with it.

I have consistently taken the position that you cannot treat inbound mail the same as outbound mail.  In a hosted service, the most common option for inbound spam is to quarantine it.  For outbound spam, what do you do with it?  For spam, no big deal, but for false positives, you definitely want to avoid non-delivery of legitimate mail.  Ultimately, you do have to do something with outbound spam; you don't want to deliver it, but the risk is that if you do, you can get blocklisted by third parties.  The pain of dealing with that is also a headache.  But at least Postini is doing something, even though I don't know what it is.

Oh, how the mighty have fallen

A few months ago, Yahoo rebuffed Microsoft's attempt to purchase it.  Now, this morning, I come across the following story:

Now that quasi-white knight Google  is out of the picture, Yahoo co-founder and CEO Jerry Yang has some advice for Microsoft: “To this day, I believe the best thing for Microsoft to do is to buy Yahoo.” Yang was the evening headliner for Web 2.0 in San Francisco, interviewed by John Battelle...

You may recall that Yang and Microsoft CEO Steve Ballmer still disagree about how their talks fell apart the last time, with Ballmer saying he withdrew at $33 when Yahoo and Yang said they wanted $37 per share. Both numbers seem incredibly remote given today’s close of $13.92.

You had better believe that both numbers seem incredibly remote given today's close for Yahoo is $13.92.

Do you remember that episode on the Simpsons where Homer quits his job to become a pin jockey at a bowling alley?  And on the way out the door from the nuclear power plant, he takes Mr. Burns on a ride around the plant, tapping on his head like a bongo drum?  And then later on, Homer has to literally get down on his hands and knees through a little door flap and ask Mr. Burns for his job back? And Burns says "So, come crawling back, eh?"

If I were Steve Ballmer, I'd make Jerry Yang do the same thing.

More Posts Next page »
Page view tracker