# Terry Zink: Security Talk

Discussing Internet security in (mostly) plain English

# Accurate metrics

### Accurate metrics

This past week, I started coming up with some new metrics on how to measure our effectiveness, specifically, our spam effectiveness.

The way Hotmail does it is use a metric called Spam-in-the-inbox, or SITI for short.  It is a measure of the proportion of spam that a person has in their inbox; it measures the effect of spam on the user experience.  It's calculated the following way:

SITI = spam-in-inbox / (spam + non-spam) x 100%

So, if a person has 14 messages in their inbox and 4 of them are spam, then we have the following:

SITI = 4 / (4 + 10) = 4 / 14 = 29%.  To put is another way, 29% of the person's mail in their inbox is spam.

Hotmail does this by means of a feedback loop where a random sample of users are selected and asked to classify their mail into spam and non-spam.  They then compare the user-classifications to the action that the spam filter would have taken in order to figure out the spam and non-spam determination and come up with a SITI value.

In Exchange Hosted Services, we don't control the end-user inbox the way Hotmail does, so no feedback loop for us.  This makes it difficult to estimate the amount of non-spam on our network.  We know how much we block and how much we deliver.  Of what we deliver, most of it is non-spam but some of it is spam false negatives.  Knowing which is which is more difficult.

This past week, I was playing around with numbers and came up with a baseline model.  It wasn't actually my idea, I borrowed part of it from our dev manager, and then combined it with the SITI metric that Hotmail invented.  What I did was look back over the past twelve weeks and looked for our best day.  The amount of mail we deliver fluctuates on a daily basis but for the most part, large increases in messages received do not correspond to large increases in messages delivered.  For example, if total message traffic increases by 15% day-over-day, our delivery count increases maybe 5%.  Similarly, if total message traffic decreases by 15% day-over-day, our delivery count decreases by 2-3%.

Anyhow, I went back and looked for our best day for messages delivered, and it corresponded to a 20% decrease in average message traffic, but only a small decrease in messages delivered.  I took that as a baseline for legitimate messages per day.  I then assumed that each day of the week has the same amount of legitimate traffic.  This is not quite accurate but small increases are negligible to the SITI calculation.  Weekends are taken to be 1/3 of legitimate traffic.  Using this as our baseline, we can determine our total weekly legitimate traffic (best day x 5 + 1/3 x best day x 2 weekend days).  We also have our total delivery count.

SITI = (Total delivery - total baseline legit) / (total delivery) x 100%

Using this formula, we can estimate our Spam-in-the-inbox ratio, which is another way of measuring our spam effectiveness and the effect of spam on our user's experience.  Going forward, we will attempt to drive our effectiveness using this value as a baseline metric.  It is more sensitive, I believe, then simply calculating our false negative rate (spam filtered / spam received x 100%).

• Post
• "The way Hotmail does it is use a metric called Spam-in-the-inbox, or SITI for short."

Not good enough.  That's somewhere around half of what needs to be measured, and I'm not sure if that's the small half or the big half.

Microsoft isn't much better or worse than Yahoo in terms of sending spams and mishandling reports from victims.  Both take turns adding each other to blacklists.  But some mail from Microsoft is legitimate, and sometimes I have to report to Yahoo that some Microsoft message in my spam box wasn't spam.

A few days ago, out of the corner of my eye, just after clicking a button to empty my spam box but before it got processed, I thought I noticed one more message that I should have checked first.  But it was too late.  Was it really from Microsoft or not, and was it really legitimate or not, I'll never know.

You have to count the opposite metric too, the amount of non-spam in spam boxes.

• I was talking specifically about measuring the effect of spam on the user experience.  The other metric is how to measure false positives, which will be the subject of a future post.

• Referring back to my previous post on accurate metrics referring to spam-in-the-inbox, spam is one side

• Referring back to my previous post on accurate metrics referring to spam-in-the-inbox, spam is one side

• "I was talking specifically about measuring the effect of spam on the user experience."

I assure you that false positives (the other metric whose article I will read in a few minutes) DOES affect the user experience.  I'm not sure if it's the big half or the little half, but it sure isn't negligible.  Don't you think my example (which was real) shows how false positives affect the user experience?

A few days ago Yahoo found a better method.  Several times in the past Yahoo put non-spam from Yahoo into the recipient's spam box, where it might go unnoticed by both the recipient and sender.  But the latest time, Yahoo took non-spam from Yahoo and bounced it back to the sender in Yahoo, so the sender knew about it and could send it to a different e-mail address (if the recipient has a non-Yahoo address).  So at least the sender knew about it.

(I'm not sure if Yahoo has improved its stats on the amount of spam from Yahoo that goes into Yahoo recipients' inboxes.)

• I&#8217;m so good sometimes I amaze even myself. I like to play around with metrics and measurements.

• I&#8217;m so good sometimes I amaze even myself. I like to play around with metrics and measurements

Page 1 of 1 (8 items)