Continuing on from my previous post, I'd like to get into more of the considerations when it comes to measuring spam effectiveness. I'm going to combine topics in this post.
Measurement has to be automated, and statistically relevant
When it comes to generating a spam feed, or measuring effectiveness, one of the most commonly used methods is the use of a honeypot. A honeypot, with regards to spam, is an email account that is seeded such that it lands on spammers' spam lists and all mail going into that account can be considered spam. The idea is that the email address is never used in legitimate contexts, it is safe to assume that all mail going to it is illegitimate. Many 3rd party companies that measure spam effectiveness will do exactly this, except they might set up a real domain in DNS and have a few email accounts.
When it comes to honeypots, I like them in theory but not in practice. They don't meet any of my criteria so far (on-going, automated, and statistically relevant) but more than any of that, the operating assumption that all mail going to them is spam is simply untrue.
So, that's why I don't like honeypots. Take it for what it's worth. I want a spam feed to have as little human maintenance as possible, and it has to generate a lot of mail. Furthermore, it needs to be reliable. Honeypots don't qualify.
PingBack from http://microsoft-sharepoint.simplynetdev.com/the-nuances-of-measuring-spam-effectiveness-part-2/
Where is this email coming from? Surely it isn't solicited? So how is it not spam?
Indeed: what Tony said. Perhaps you should do a post about what your working definition of spam is. It clearly differs from mine, and I also do anti-spam for a living.
This can be confusing in determining what spam is. Usually those types of emails slip through the cracks. Those are not exactly solicited.