6. Grey Mail For all of our discussions around spam and non-spam, there is still the issue of grey mail. What is grey mail? Do we include grey mail in our spam corpus? Should we include it in the non-spam corpus or omit it altogether? To begin with, let’s define what we mean by grey mail. This is the mail that some users do not want and other users do want. There are various kinds of grey mail:
Using this definition of grey mail, we can get a representative sample of what we will consider to be good and bad mail. As long as everyone uses the same corpus and the definitions make sense, the numbers will be meaningful.
This paper has defined these sets of metrics as well as given guidelines for how to sample mail in order to measure effectiveness. Agreement upon common metrics will drive meaningful cross-competitive analysis towards the goal of improving the end user’s anti-spam experience
PingBack from http://blog.a-foton.ru/2008/06/12/a-common-set-of-metrics-part-5/
Non-sequitur of the week: "2% of the mail is assumed genuine ... [because] mass email has a 2% response rate, implying that 2% of it is legitimate"
Just because I don't click-though from a legitimate bulk message, doesn't mean it's spam.