Why do you suppose a computer HAS to classify spam as a human does? The problem of spam is not one of classification, it is one of AUTHENTICATION. The problem is that you simply cannot tell FOR SURE who sent you a piece of email, except by examining its contents. That, in my opinion, is the fundamental problem of spam -- its not some sort of "AI" problem, it is a simple authentication problem.
The SMTP protocol is, in my opinion, fundamentally flawed because it DOES NOT include any inherit authentication mechanism. The sender can put any old address in the "From" line he likes.
The "solution" to spam, therefore, is not to make "smarter AI." It is to drop SMTP and come up with a new, more secure protocol. Of course, coming up with such a protocol is fairly easy. Making it as ubiquitous as SMTP is the hard part!
I think Dean is right, e-mail needs some authentification.
Another way would be to charge for emails to make bulk mailing inattractive.
And even humans make mistakes regarding spam: valid newsletters might be regarded as being spam if you don't remember subscribing to them.
Can a computer denote permission? It's certainly an important measure of what makes something spam or not. Problem is, I have yet to find a bayesian filter that can tell whether or not I willingly gave my email address to Macy's or Minnesota Public Radio.
Hello, Dean,
> Why do you suppose a computer HAS to classify spam
> as a human does? The problem of spam is not one of
> classification, it is one of AUTHENTICATION.
I tend to view authentication as a shortcut towards classification.
For example, if we know that a sender has a bad reputation, we don't need to authenticate, we can reject mail outright.
Even if a sender is authenticated, it still doesn't tell me whether or not the content is legitimate.
Take a gray-hat mass emailer. I might know the mail is coming from them, it may even be authenticated but that doesn't mean I want to see it.
> It is to drop SMTP and come up with a new, more
> secure protocol. Of course, coming up with such a
> protocol is fairly easy. Making it as ubiquitous as
> SMTP is the hard part!
I agree a more secure protocol is important but ultimately the judge of what I do and do not want to see in my inbox is me, or rather, a computer that is very good at figuring out what I do and do not want to see.
> Can a computer denote permission? It's certainly an
> important measure of what makes something spam or
> not. Problem is, I have yet to find a bayesian
> filter that can tell whether or not I willingly
> gave my email address to Macy's or Minnesota Public
> Radio.
Perhaps a custom-Bayesian filter that looks at your historical opt-in lists would be able to figure out, and assign a probability, of whether or not you willingly did it.
It could sort of be like the movie "Click". It would remember previous settings.
> Take a gray-hat mass emailer.
With proper authentication, there would be no such thing as "gray-hat" mass emailers. You either subscribe to a mailing list, or you do not. If you did not subscribe, then you don't want to see it.
(Note: Yes, you still need to allow "unsolicited" email, but it is possible to tell the difference between unsolicited *one-off* email and unsolicited *bulk* email WITHOUT looking at the contents. Once you can do that, filtering based on content is not important any more.)
I think you're still thinking in terms of the way email works right now -- I'm saying that the way email works right now is fundamentally wrong.
> With proper authentication, there would be no such thing as "gray-hat" mass emailers.
Don't we already have authentication right now in the form of SPF/SenderID/Domain Keys?
Perhaps you are saying that rather than having these authentication procedures optional, they will be mandatory?
> Perhaps you are saying that rather than having these authentication procedures
> optional, they will be mandatory?
Correct. For example, if SenderID were mandatory, botnet spam houses would not be able to work -- you can't add 10,000 entries in your DNS record for the 10,000 IPs addresses you "own". Besides, there's a nice paper-trail for the FBI to shut you down ;)
Anyway, SenderID isn't all that complicated and there are plenty more authentication schemes that can help.
And as I said above, the problem is not coming up with such mechanisms, its actually MAKING them mandatory ;)