You couldn't be more right. Measure everything. Without data, one is blind.
One very important one you're missing is the other side to the coin of "spam in the inbox" - user-reported non-spams in the spam folder. That's quite possibly the most critical metric to drive to zero, because when you get several thousand spams per month, there's just no way you're going to actually look through everything in your spam folder. Losing a real mail is the most dangerous thing you can do. Far worse than the occasional spam in the inbox.
Terry, I think this is the best post you've ever made, and it's something I've been saying for years.
A lot of customers/prospects don't understand that everyone has false positives - it is how you deal with them that is absolutely key. Something that I know one of our competitors just doesn't seem to "get" according to various people I have heard speak on the matter.
Thanks for the kind words, Al.
A non-spam in the spam folder would be a false positive (which I actually mentioned but thanks for pointing it out). In terms of measuring spam %, there are two ways to do it, overall spam % and spam-in-the-inbox, which is a metric that Hotmail uses. I'll go into greater detail in another post.
Thanks for the feedback. What I have found among customers is that people dislike spam coming through (false negatives) but they really *hate* false positives. I used to be in charge of FP processing over here and I strove to turn them around as quickly as I possibly could.
Good idea. As a sysadmin, one of the things I have trouble with is effectively determining some of these metrics from the mail and spam logs. There aren't any good tools out there that I have found that would give me a line for each message so I can pass it through to programs like sort, uniq, grep, etc. So I've been writing one. Its for the Postfix/Spammassassin combination, but mostly for Postfix. I'll release it on Freshmeat when its ready.
Hey, Terry, have you seen this Register article?
Stuart: Driving to zero the user-reported non-spams in the spam folder may not be a realistic goal. From what I've seen discussed on the Planet Antispam aggregator over the past year or so, users will report spam as non-spam almost as often as the other way around (and anyone who runs a mailing list knows there's a lot of the other way, around). I seem to recall that John Graham-Cumming's SpamOrHam experiment demonstrated that most people are surprisingly bad at classifying messages.
As the one that used to be in charge of combing through false positives, I can confirm that. When I first started here, about 80-90% of false positive submissions (non-spam that we filtered as spam) were not legitimate, that is, they were actually spam. That's dropped somewhat over time but even still, the greatest amount of time in processing them is the effort it takes to separate the wheat from the chaff.
Reading through your article, I've actually done a lot of work with Smartscreen, the technology Hotmail uses to do their spam filtering. It's actually a rather clever algorithm. Like all spam filters, it has its strengths and weaknesses and some users find it too sensitive (but it's sensitivity can be adjusted to produce a desired False Positive rate).
Over here in Exchange Hosted Services, we started using it a few weeks ago to supplement our own spam filtering but we use it differently than Hotmail does. The biggest advantage that Smartscreen has is that it's based on seeing spam from hundreds of thousands of Hotmail users and millions of Junk Mail reports; thus, it is able to incorporate a lot of different types of mail into its classification system.