I have a bit more on my previous post about Click Fraud.
To me, Click Fraud is much riskier than spamming. Consider the differences: with spam, the spammer is spreading out abusive content all across the internet and each user only gets a small piece of the annoyance. For individual users to stand together and go after a spammer is difficult because organizing people in large groups is not an easy task. Certainly, ISPs and filter providers would have an interest in seeing spam stopped, not to mention those who pay for the backbone infrastructure of the internet, but the point is that spammers are sending to an entity that is essentially decentralized.
Click Fraud is different. If you want to manipulate pay-per-click ads on the Internet, you pretty much have to abuse the big search providers – Microsoft, Yahoo and Google. You need to push up your search rank and get click-throughs or forge them yourself. The problem is that a “spammer” is not abusing a decentralized group of users, they are abusing 1 of 3 different companies. And, these companies have a vested interest in keeping their services clean – it costs their customers money and if their customers are not making money with the services they provide, they’ll go to a more secure competitor. If a customer has to pay for all these clicks and nothing comes of it, then the return on the investment is negative and not worthwhile.
I don’t do any anti-abuse in Microsoft’s Search department, but if I were to hazard a guess, then off the top of my head here’s how I would detect abusive behavior. I’d keep track of which were the most popular click-throughs and where they were coming from. But, I would also keep track of changes in clicks and popularity searches. In general, changes in behavior are more interesting than snapshots. By observing which types of ads were moving to the top quickly, I would be able to detect behavior that deviated from the norm.
But more than that, I would break clicks into categories. Maybe Games, Products, Pharmacy, Stocks, Gambling… wait, this is starting to sound familiar… I’d keep an eye at a high-level which were the most popular general categories. Within each category I’d have subcategories and keep track of those. Perhaps Pharmacy, I’d have ads for Ritalin, Vicodin, Viagra, Levitra… hold on a minute…
Anyhow, I’d have subcategories again. Using these I’d be able to keep track of who was moving up quickly. I’d develop algorithms that would be able to alert me to things that are changing position and also build a reputation table for patterns that have been abusive in the past. I’d also keep track of all of the IPs that were bumping up the search rating and in that regard, I’d be able to tell who was clicking on what advertisements.
I’d then start cross-referencing the IPs to see if there were any commonalities between search ads that moved up in rank across category groups. I’d also start building a database of abusive IP ranges and eliminate them as being able to contribute to search ranking. I may even attempt to push them through to false-click scenarios where it looks like they are getting a positive response, but in reality all they are doing is wasting bandwidth.
Man, this stuff is actually pretty easy. :P