A couple of weeks ago, I blogged that some outfit in Russia claimed to have broken Yahoo's CAPTCHA for creation of new email accounts. Someone posted a reply in the comments with a link to an article that this was unlikely.
Yet, in the past couple of weeks, I have noticed something that would seem to confirm the theory of CAPTCHA's being broken. I have a Yahoo account, a Gmail account and my own Frontbridge account (I also have a Hotmail account but I check it rarely, and a Microsoft account which I exclude from this analysis). Over the past few weeks I have seen an increasing amount of spam from Yahoo, Gmail and Hotmail. I have also seen a few discussion threads talking about spam being relayed through Yahoo/Google/Hotmail's outbound servers. In other words, people getting accounts through those services and then sending spam.
If a CAPTCHA really was broken, then this is the type of behavior I would expect to see. On the other hand, there are alternative explanations like systems being infected with malware that logs into people's pop3 accounts (using keystroke loggers or something) and sends spam out that way.
It will be interesting to see how this plays out. The bottom line is that the outbound spam filtering problem is affecting everyone, not just us.
Remember, it's not about "breaking" a captcha, just about getting a good success rate.
I'm fairly sure you can teach a neural network to crack all those captchas with a fairly low success rate, but good enough (even 10%) that you can just keep trying with enough botnets.
We saw the same increase too, and my gut feeling is captcha cracking software running on botnets.
That's more what I was referring to, a success rate good enough to start pumping out more spam, not necessarily 100%. I should have clarified what I meant.
Wait, I think I have to take it back. If an automated technique can break even 10% of the CAPTCHAs, then that is basically the equivalent of breaking a CAPTCHA. From a security perspective it is 100% broken.
Research literature specifies that a CAPTCHA is deemed "broken" after someone can achieve 70% success rate...I'm not saying that I agree with this figure, but it's what is generally accepted in the research community.
If you choose to say 10% recognition or better means a CAPTCHA is "broken", then that is just an arbitrary number. Again, I'm not saying that I have a better metric for evaluating the security of a given CAPTCHA, but any fixed number is inherently arbitrary.
I think the recognition threshhold needs to depend on the application.
For example one time Google thought I was a bot because I was searching for MSDN pages about some Win32 API. Google displayed a page which refused to perform the search for me. At that time Google offered a feedback link for complaints, and they even replied to me, once. They explained the purpose of the Captcha on the refusal page. Of course I replied that there was no Captcha on the refusal page, and if there had been one then I would have used it. Of course Google didn't answer again after that. (Microsoft should take over Google too, peas in a pod.) Anyway, suppose there had been a Captcha on the refusal page. Suppose a bot had a 10% chance of breaking the Captcha. Then there's a 10% chance that the bot could find out which MSDN pages discuss some Win32 API. That's not a big problem. This hypothetical Captcha could be considered unbroken.
But suppose another bot has a chance of breaking a Captcha that will let it send spams out from Google's mail servers. Then Google has to reply to victims with statements that Google's mail servers aren't Google's mail servers. Of course when victims follow up with ARIN Whois lookups, of course Google can't reply again. (Microsoft should take over Google too, peas in a pod.) Anyway, suppose a bot had a 10% chance of breaking this Captcha. Then there's a 10% chance that Google will help the bot send spams. That's a big problem. That Captcha is broken.
Captcha is broken, a free yahoo breaker was released, gmail was broken and I myself even added a few extra nails in the coffin:
I've released some ocaml code which is capable of breaking a large subset of captchas out there. It can't do yahoo/google but it got close. It was a partial reimplementation of EZ-Gimpy.
Captchas limit everyone from the star hacker who wants to automate his life to the blind person struggling to get around on a new less friendly web. Accessibility is for everyone, there are other potentially more fair ways to limit "Free access" to resources.
Current CAPTCHA is definately broken and getting harder and harder for REAL PEOPLE to pass. I've just created a beta CAPTCHA service that I'd like to have tested by those who enjoy breaking security software. Any takers? Documentation http://bothole.appspot.com/doc">http://bothole.appspot.com/doc To directly play with it from a browser use http://bothole.appspot.com
I'd like to avoid the Predicted SPAM Tsunami any ideas to make this service better and easier to use would be welcome.