The following is an excerpt from Investor's Business Daily:
Microsoft (MSFT), the software giant, increased its market share in U.S. Web searches to 8.23% in June from 7.81% in May, thanks to its new Bing search site, according to tracking firm StatCounter. Web search king Google (GOOG) lost share slightly, dipping to 78.48% from 78.72%.
Figures like these really annoy me. Why? Because they are using statistics inaccurately. Look at Google's "loss" of search share - a drop of 0.24%. How could they possibly measure that?
In statistics, there is always a margin of error known as the confidence interval. If you were to survey a group of users and 75% of them reported the same answer, then you cannot straight out extrapolate that to the rest of the population. If you sampled ~1000 people, then you can say that 75% of the population, +/- 4% would give the same answer. At a 95% confidence level, then you would say that you are 95% confident that between 71% - 79% of the population would give that answer.
Surveys work by doing random sampling. Yet, in order to get the responses above, we have to make sure that the margin of error is less than the difference. For example, in my above example, suppose you asked 1000 people what kind of widget they liked best and 67% of them said Widget A. Next month, you ask 1000 people the same question and and 65% of them say Widget A. Does that mean there was a drop of 2%? No, because the 2% drop is within the 4% margin of error from the previous month. You cannot be certain of anything.
In order for Google to have experienced actual market share loss, the original number had to be 78.72% +/- 0.11%, while the second number has to be 78.48% +/- 0.11%. Why? Because we have to have non-overlapping margins of error:
78.72 - 0.11 = 78.61%
78.48 + 0.11 = 78.59%
Those two do not overlap and thus we can be confident that real market share has been lost by Google. So, how many people would the survey have to interview in order to get that confidence interval? About 735,000. I somehow doubt that this surveying company actually asked that many people what their favorite search engine is (or however they did their sampling). In order for Microsoft to have gained their market share, they would need to have sampled 213,000 people. Sounds unlikely to me unless they have some automated way of culling out all of this data.
People need to know how to use statistics properly.
I'm going to attempt to summarize a blocklist without going to the article on Wikipedia. I'll be doing this straight off the top of my head.
Motivation
A blocklist is essentially a shortcut to spam filtering. Assume that you have a content filter that is doing all of the work of filtering, faithfully executing and flagging messages as spam. Everything is great except that the spam filter is doing a lot of work and occasionally, the odd spam message or two slips through. You can live with this if all you are filtering is 10,000 messages per day.
But imagine you are filtering 10 million messages per day. Suddenly bandwidth becomes an issue because most of your bandwidth is being taken up by useless data (spam). In addition, if your filter is "only" 99% effective, 100,000 spams are still leaking through to end users. If your organization has 10,000 users (a good size company), then that's about 10 spams per day to the end user.
You need a way to make this work better.
Methods
You sit down one day and start pouring through your spam samples that your end users are submitting to you. "What's this?" you say out loud to no one in particular. You observe that while the spams have no particular pattern, you do notice that they seem to be coming from a narrow set of IPs. Let's say that out of 100 messages, you see the following pattern (I'm using hypothetical IPs):
| IP | Spam Count |
| 292.144.16.11 | 16 |
| 292.144.16.17 | 15 |
| 292.144.16.19 | 22 |
| 292.144.16.22 | 18 |
| 292.144.16.27 | 29 |
"That's odd," you say again. "There seems to be a lot of IPs in that range." You do a quick WHOIS lookup of that IP and you find that the IP space is owned by the organization Canadian Pharmaspammers. "Well," you exclaim, "if these guys own those IPs, I should flat out block them all! It is very unlikely that they will ever send out anything legitimate." How do you know this? Spammers never change their spots. If a spammer sends out this much spam from these IPs, at that level of volume (100 messages randomly sampled) then you can safely conclude that they will never send out anything else.
You decide to add all five of those IPs to your own blocklist. Anything that hits your network that comes from those IPs you will reject (how this works we'll get to in a future post). You've now saved your end-users from getting spam from these IPs.
Refinements
You wipe your hands and assume the problem is solved. But it's not; users are still getting Canadian Pharmaspam! Once again, you start to grab the spam samples and looking at the connecting IP. The content is all different -- again -- but the IPs look familiar:
| IP | Spam Count |
| 292.144.16.12 | 19 |
| 292.144.16.14 | 17 |
| 292.144.16.18 | 18 |
| 292.144.16.21 | 20 |
| 292.144.16.26 | 27 |
Those IPs look similar to the IPs you previously blocklisted. You have no spam from those other IPs, but lots of spam from its sister IPs. Once again, you decide to do a WHOIS look up on that IP and notice something you didn't see before. It's listed to Canadian Pharmaspammers, but they also own the netblock 292.144.16.0/27 -- a netblock of 32 IPs. You decide to get pre-emptive; you go into your personal blocklist and remove the previous five IPs and instead insert 292.144.16.0/27. You have now listed the entire range of IPs. You only have evidence from 10 different IPs but strongly suspect that spam is coming out of all of them, and therefore you engage in a pre-emptive strike. You list the IP range, cross your fingers and hope for the best.
The next day you check your spam stats and notice something; rather than content filtering 10 million messages per day at the content filter, your upstream IP filter has cut that down to 1 million per day! Gah! That's a reduction of 90%! Your content filter is flying! Furthermore, the amount of spam complaints has gone down from 100 per day to 20 per day, a reduction of 80%. By adding these IPs to the blocklist, you have accomplished two things:
- Users are seeing less spam in their inboxes because while your filters are good, there may be gaps. This blocklist fills in those gaps.
- You have saved a good chunk on bandwidth and spending precious resources on less and less junk.
Those are the two basic uses of blocklists. A third would be spam filter automation and leveraging the work of others, but we'll get to that in a future post. But by and large, these impacts are immediately noticeable by everyone using the service and therefore, the use of blocklists eventually becomes indispensable if you want to run a filtering service.
A couple of weeks ago, the Financial Times ran an article entitled "Secret War on Web Crooks Revealed." Here's an excerpt:
The people who run the world's internet systems are a rather secretive bunch. Three times a year, senior technical officers from companies such as Google, Yahoo, AT&T, Comcast and Verizon meet to discuss ways of stopping the internet from being swamped by rising levels of spam, viruses and hacking attacks by organised criminals. They do not generally like discussing these meetings. "Some people might get nervous if they knew all the things we talked about," said Michael O'Reirdan, chairman of the Messaging Anti-Abuse Working Group (MAAWG). "It’s our job to make the internet safe, but we don't want to put people off using the web." They are also worried about being targeted by the cyber-criminals they are trying to thwart.
Indeed, it is a secretive group. It's kind of like the Stonecutters. Things are discussed there and the idea is to come to a consensus and make recommendations about how to make the Internet safer and less a haven for (un)common criminals.
Now, not having been to these latest meetings, I don't know for certain what goes on. But I have been to other, non-MAAWG meetings and I certainly know what goes on there. I have also been to a lot of cross-group meetings at Microsoft and I'm fairly certain that the types of meetings at Microsoft probably are not too much different than MAAWG. So allow me to speculate a bit.
MAAWG is attended by hundreds of well-intentioned and well-meaning people. They want to get rid of the dark evil that are spammers, malvertisers, virus writers, and all of their ilk. Yet, coming to a consensus on all these things is very difficult. People from industry have competing interests from people in research groups, or people in government, or people in the IETF or ARIN. And when people with competing interests try to come to a resolution about how best to proceed, sometimes it can take a while to make any progress. Of course, MAAWG has made very great strides in mitigating email abuse.
And that brings me to another point. This past weekend I was watching The Fellowship of the Ring. I got to the scene in Rivendell after Frodo has brought the ring there, and Elrond calls a meeting with representatives from Gondor, the Elves and the Dwarves. The Ring is presented to everyone in attendance and there is a general agreement that the Ring must be destroyed because it is so evil. I view this like MAAWG - everyone in attendance there agrees that spammers are evil and must be stopped (maybe not destroyed).
But at the Council of Elrond, everyone disagrees about the best way to dispose of the ring. Dwarves don't want Elves to carry the Ring, Elves don't trust Dwarves and the race of Men want to use it as a weapon against the forces of Mordor. I kind of see this as anti-spam fighters engaging in dubious tactics to shut down spammers (such as breaking into their servers and stealing data or deliberately inflicting sabotage). Arguments ensue and nobody gets anywhere. This is kind of like competing solutions and standards fighting it out in the real world, and in the meantime spammers are still sending their payload.
Eventually, Frodo speaks up and announces he will take the ring, though he does not know the way. Everyone looks at him and though in disbelief, they agree that the ring should go with the Hobbit. An agreement has been reached. This is like MAAWG, or CAUCE, or whoever finally agreeing to some standard way of doing things (like DKIM or SPF, or ARF format for reporting abusive mail, and so forth). Progress is being made and the enemy's progress has been impeded.
Maybe it's not the best analogy, but it's the one that floated into my mind when I watched that scene.

BTW, I'm no Frodo. I think I identify more with Boromir.
One of the stories that is circulating around the Internet this week is the announced imminent closure of the SORBS blocklist. Al Iverson of SpamResource has a good summary of it. SORBS has had its share of criticism in the past, however. From Wikipedia:
Spam database removal procedure
In order for IP addresses that have spammed in the past to be removed from the spam database, SORBS requires what it calls a "fine" in the form of a US$50 donation to a registered charity. This donation is only required for deletions from the spam database that have not expired automatically, and it is waived both for IP addresses that have been reallocated elsewhere or if the ISP implements outbound content-based spam countermeasures. However, because of these requirements, SORBS's removal procedure has been compared to extortion, but SORBS says it is not.
In the antispam community, this particular blocklist has had its detractors who say that dealing with the list has been a nightmare. On the opposite end, others say that the list has been nothing but professional with them.
I won't comment or give my particular opinion on SORBS. Rather, the announced closure of the list has prompted me to finally start a small mini-series on a topic that has been floating about in my head for several months now: what does it take to set up and run an RBL? And, more importantly, what does it take to maintain an RBL?
The goal of this series is to examine what goes on behind the scenes of compiling and maintaining an RBL. We've run a private one for three or four years now and maintaining it has been no picnic. Things break down, disks run out of space and the people who wrote the original scripts (in three days with tons of bugs) move on. Thus, I suppose one could call this the Complete Guide to Running a Blocklist in the Real World.
Remember, I deal with reality. Because we run a service, we know who our blocklists affect and that it impedes real mail flow. We also deal actual complaints and our policies are affect accordingly. It should be a good series.
With the explosion in popularity of Twitter (of which I am not a twitterer or even a subscriber), I've wondered to myself whether there is such a thing as twitter spam.
Now, spam in the email sense is when spammers flood your inbox with unwanted email. But with Twitter, if you're subscribing to someone's feed, then how can you be spammed? You could just stop subscribing them if they were really getting annoying but really, you're opting in and you know who's sending you "mail." It's kind of like getting RSS spam... which is counterintuitive.
I did a quick Bing search and found out that there is such a thing as Twitter spam known as "Follow spam". From Twitter's blog:
What is "Follow Spam?"
Follow spam is the act of following mass numbers of people, not because you're actually interested in their tweets, but simply to gain attention, get views of your profile (and possibly clicks on URLs therein), or (ideally) to get followed back. Many people who are seeking to get attention in this way have even created programs to do the following on their behalf, which enable them to follow thousands of people at the blink of any eye.
As you can imagine, this is a problem. In extreme cases, these automated accounts have followed so many people they've threatened the performance of the entire system. In less-extreme cases, they simply annoy thousands of legitimate users who get an email about this new follower only to find out their interest may not be entirely...sincere. On rare occasions we may see a person who is mass following and actually cares about every tweet—there is an opportunity for us to learn more about this use case and work to provide a better experience.
I don't fully understand why someone would choose to engage in Twitter spam but the idea seems to be that if you follow a lot of people's Tweets, the followees will click on your profile. If you were a spammer, you could a link to your product in your profile in hopes of getting the Tweeter to follow it and get to your site. It's a way of avoiding a spam filter since the spammer is already in the network and presumably, there is a level of trust. After all, if your a Tweeter, it's kind of flattering to have a lot of people follow your tweets.
But for the Tweeter, having a lot of spammers follow you becomes really annoying. You want real people to follow you, not spammers hyping up your statistics. You can't go through your followers profiles because all you're doing is sifting through a lot of chaff. Twitter also cannot build accurate statistics on user profiles in order to one day monetize their size.
Ultimately, the problem of Twitter abuse will come back to the same problem faced by the webmail providers - spammers are breaking CAPTCHAs and using them to send out spam. The spammers are doing the same thing here an irritating everyone with their abusive behavior. I suspect that there will be a similar convergence in anti-CAPTCHA-breakage techniques that there was for spam including IP reputation and behavioral analysis (content filtering).
In a story by PC World, spam kingpin Alan Ralsky has plead guilty to a stock fraud case where he pumped up Chinese penny stocks:
Ralsky and four other individuals pleaded guilty on Monday, joining three others who had pleaded guilty earlier, the U.S. Department of Justice announced Monday. Cases are still pending against three other people, they said. The defendants were indicted in the Eastern District of Michigan in 2007.
In 2004 and 2005, the group engaged in a set of related conspiracies to manipulate stocks using false and misleading spam messages. After the spam boosted the trading volume and prices of the thinly traded stocks, the conspirators profited by trading in their shares. Many of the shares were low-priced "pink sheet" stocks for U.S. companies owned by individuals in Hong Kong and China, the DOJ said.
In addition to using false and misleading information in the spam messages, the conspirators created and sent the e-mail using software that made it hard to track the messages back to them, the DOJ said. They also used illegal methods to get around spam blockers and trick recipients into opening and acting on the messages. They falsified the headers, used proxy computers to relay the spam and falsely registered domain names, the DOJ said.
Antispam advocates have long considered Ralsky one of the world's most prolific junk e-mailers, though he has claimed he is a legitimate business operator. He reportedly once admitted sending more than 70 million messages a day.
I can remember this type of spam blitz. In 2004 and 2005, when I did a lot of spam processing, I can specifically recall doing WHOIS lookups on domains and seeing Alan Ralsky's name being associated with them. This guy sent a ton of spam and I can't even recall how many domains and IPs of his I was responsible for blocking.
It should come as no surprise that his activities eventually caught up with. There have been a lot of spammers getting convicted in the past couple of years or getting nailed with huge lawsuits. It's nice to see Ralsky go down, he was particularly notorious. 70 million per day is a huge number of annoying messages per day.
The other day, I got the following spam message in my inbox (junk mail folder, actually):
miaou skoal.
ripe fanny hash tome?
hypo kirk.
griff trow canoe kirk.
fix die dance.
fix coach born hazy?
silky brier mutt wrest.
samp cad wrest adopt?
ahoy pest arc.
arc targe peter puce.
<http://domain_munged.com/?said=g19c>
The payload to the website is obvious, it's a link to a spammy domain. But why the non-sensical text at the top of the message?
This is a very old spam technique that a former spam analyst I used to work with coined "hash-busting." The idea is that spam filters will create hashes of spam messages. When an inbound message arrives, the message is hashed and then looked up in the database. If the hash is contained within the database, and the hash is associated with a spammy one, then the message is spam.
Each hash is unique, and if you change the content of the message then you change the hash. In theory, if a spam filter were using this hash technique then all a spammer would have to do is make small changes to each message, perhaps changing one word per spam, and the spam filter would never be able to catch it. Different content per message yields a different hash key and therefore a filter could never catch this spam. It would be forever looking up keys that didn't previously exist, and all of its existing keys would never be seen again. A spammer could then use the same domain indefinitely but only change the random text in the message.
The other mechanism that this type of spam aims to defeat is Bayesian filtering. By putting a bunch of garbled text in the message that changes each time, the Bayesian filter never detects anything in the message that it can classify as spam. The words are neutral and therefore the Bayesian probability engine judges the message as neutral, not as spam.
This type of technique, to my knowledge, doesn't work that well. Most spam filters use a variety of filters and methodologies to capture spam, and the two types of techniques that it is trying to defeat are not that reliable anyhow. Antispam vendors have better, more robust ways of catching spam and so the spammer, while attempting to be clever, will have better luck next time.
Today, for the first time every, I got a phishing spam from a spammer targeting a bank that I actually use. A couple of months ago, Washington Mutual held a "contest" where if you opened an account and put at least $100, they'd also contribute $100. Wanting to double my money for almost nothing, I took advantage of it. My only goal for doing this was to collect my 100% rate of return.
Well, a few weeks later, in one of my many email accounts, I noticed that I got an email notice from Chase, with the subject line Chase Bank Security Service Notification (IMPORTANT). Here's the message:
When I saw the message in my inbox and I glanced over the subject line, my first thought was "How did they get my email address? I never gave it to them." Yes, that was the first thing I thought, it was completely instinctual. It only lasted for a brief moment because I immediately realized that I was being phished.
Tsk, tsk. If only the spammers knew who they were dealing with... not that they care. But the point is that it goes to show that things like this operate on an emotional level. People see that a message comes from their bank and they are interested in seeing what is going on. The threat to take action, particularly about fraudulent action, scares people into taking action on it. This is nothing new and is an example of social engineering action, it is a spam technique that has been around as long as I have been fighting spam. But as I said, it's the first I have ever been phished from a bank I use.
Incidentally, both Firefox and Internet Explorer blocked the site and reported them as unsafe. It's a good thing both browsers did that because the site is very well polished and looks real.

I have a bit more on my previous post about Click Fraud.
To me, Click Fraud is much riskier than spamming. Consider the differences: with spam, the spammer is spreading out abusive content all across the internet and each user only gets a small piece of the annoyance. For individual users to stand together and go after a spammer is difficult because organizing people in large groups is not an easy task. Certainly, ISPs and filter providers would have an interest in seeing spam stopped, not to mention those who pay for the backbone infrastructure of the internet, but the point is that spammers are sending to an entity that is essentially decentralized.
Click Fraud is different. If you want to manipulate pay-per-click ads on the Internet, you pretty much have to abuse the big search providers – Microsoft, Yahoo and Google. You need to push up your search rank and get click-throughs or forge them yourself. The problem is that a “spammer” is not abusing a decentralized group of users, they are abusing 1 of 3 different companies. And, these companies have a vested interest in keeping their services clean – it costs their customers money and if their customers are not making money with the services they provide, they’ll go to a more secure competitor. If a customer has to pay for all these clicks and nothing comes of it, then the return on the investment is negative and not worthwhile.
I don’t do any anti-abuse in Microsoft’s Search department, but if I were to hazard a guess, then off the top of my head here’s how I would detect abusive behavior. I’d keep track of which were the most popular click-throughs and where they were coming from. But, I would also keep track of changes in clicks and popularity searches. In general, changes in behavior are more interesting than snapshots. By observing which types of ads were moving to the top quickly, I would be able to detect behavior that deviated from the norm.
But more than that, I would break clicks into categories. Maybe Games, Products, Pharmacy, Stocks, Gambling… wait, this is starting to sound familiar… I’d keep an eye at a high-level which were the most popular general categories. Within each category I’d have subcategories and keep track of those. Perhaps Pharmacy, I’d have ads for Ritalin, Vicodin, Viagra, Levitra… hold on a minute…
Anyhow, I’d have subcategories again. Using these I’d be able to keep track of who was moving up quickly. I’d develop algorithms that would be able to alert me to things that are changing position and also build a reputation table for patterns that have been abusive in the past. I’d also keep track of all of the IPs that were bumping up the search rating and in that regard, I’d be able to tell who was clicking on what advertisements.
I’d then start cross-referencing the IPs to see if there were any commonalities between search ads that moved up in rank across category groups. I’d also start building a database of abusive IP ranges and eliminate them as being able to contribute to search ranking. I may even attempt to push them through to false-click scenarios where it looks like they are getting a positive response, but in reality all they are doing is wasting bandwidth.
Man, this stuff is actually pretty easy. :P
The New York Times ran an article yesterday saying that Microsoft is suing three people in a click-fraud scheme. The investigation took more than a year and the company is seeking $750k in damages.
Click Fraud is when people manipulate clicks on an advertisement on the web. The more you click, the more money you can make but if you automate the process, you can end up making a lot of money. Of course, the vendor who foots the bill pays for all of those clicks and if someone is clicking on it without any attention of purchasing, that’s fraudulent and is not the way web advertising is supposed to work. According to ClickForensics, about one in seven clicks on an advertisement is estimated to be fradulent. That sounds like a staggering amount… but I guess compared to the 97% of email traffic that is spam, it’s not so big I suppose.
Here is how Microsoft detected the anomalous patterns:
Microsoft said it found a pattern of click fraud on its search pages, where lists ranked by relevance and popularity appear alongside a handful of paid results. Advertisers bid on what they will pay to appear in the paid-search results for certain keywords. The more an advertiser pays, the higher they are in that list, and advertisers usually pay for each click on their ad.
In March 2008, several auto insurance advertisers began complaining to Microsoft that traffic to their ads was spiking suspiciously. Microsoft looked at the searches being conducted, and noticed that searches for keywords like “auto insurance quote” had sharply increased. And clicks to the advertisers appearing at the top of the paid-search results listings for those terms were high.
Microsoft investigators noticed there was an oddly similar pattern in a seemingly unrelated area, advertisements for the game World of Warcraft. Though investigators weren’t sure how the two were connected, they began to see some similarities. Although traffic appeared to come from different computers, it was actually coming from two proxy servers, which mask the original address of a click.
Microsoft began trying to stop the suspect traffic, but a little game evolved. Microsoft would block a server, or block a certain level of traffic for those advertisements, but whoever was on the other side of the clicks kept finding new ways around the company’s fixes. [tzink – This is almost the same tactics used by spammers and botnets, once a bot’s IP gets listed, the spammer drops the bot and moves onto another IP. A cat-and-mouse pattern like this is generally indicative of abusive behavior]
Microsoft didn’t know why someone would be interested in both World of Warcraft and auto insurance ads, though, until a third party told investigators that an advertiser for World of Warcraft keywords was also taking a fee for directing traffic to auto insurance sites. Investigators figured out that seven different accounts, registered under different individual and company names, were linked to the three defendants.
Microsoft’s theory is that Mr. Lam was running or working for low-ranking sites that took potential client information for auto insurers. The complaint said that he directed traffic to competitors’ Web sites so they would pay for those clicks and exhaust their advertising budgets quickly, which let the lower-ranking sites that he sponsored move up in the paid-search results.
When people clicked through to his site, it asked them to supply contact information, which he then resold to auto insurance companies, according to Microsoft’s complaint, which estimated his profit at $250,000. In the complaint, it also said it had to credit back $1.5 million to advertisers because of the Lams’ alleged fake clicks. Microsoft is seeking $750,000 in damages from the defendants.
Click-Fraud is much newer than spamming, but it’s still irritating all the same. It remains to be seen whether or not it will become as popular as spamming. The problem for “spammers” of click-fraud is that they are not targeting individual users, they are targeting large corporations like Yahoo, Microsoft and Google. These companies have a vested interest in protecting their own financial interests. They have a lot of resources to expend when someone is abusing their services which they have to pay for. It’s one thing to abuse something that is free, and quite another to abuse a paid-for service.
I would think that the latter has more inherent risk.
We all remember back in November 2008, the botnet command-and-control center hosting ISP McColo was taken offline. Overnight, spam levels dropped by 40%. It was one of the most significant antispam operations in the history of fighting spam. Spam volumes eventually started climbing back and by May 2009 they were pretty back to where they were before the takedown (at least on our servers; I don’t speak for others who say they recovered earlier than that).
Last week, the FTC shut down another notorious ISP, owned by Pricewert LLC who the FTC is taking to court. According to the claim, Pricewert does business under multiple names including 3FN and APS Telecom, and actively recruit and collude with criminals seeking to spread abuse on the internet. This ISP is another command-and-control center so shutting it down would affect their botnet’s abilities to download instructions and spew out more spam.
I resisted commenting on this when I first about it last Thursday (June 4). I decided to take a wait-and-see attitude to make sure that spam was actually decreasing because of this shutdown. It is not unusual for us to see spam levels drop on a week-over-week basis, let alone day-over-day. This happened in June 2008 and April 2008, and to my knowledge there were no other spam bot takedowns during that period.
Today, Symantec is reporting that spam levels are down 15% from last week’s levels. Again, I decided to proceed cautiously. To verify our numbers, I took the daily average of last Friday (June 5), and this past Monday to Wednesday (June 8-10); I deliberately excluded weekends because they can skew data, especially for small time frames. I then compared it to the previous three weeks worth of weekly data where the amount of spam we were receiving had stabilized.
Our spam volumes, and drops therein, agree with Symantec’s. The total amount of spam that we are catching has dropped by 15% as well. Is this random noise? To calculate it, I determined the 30-day average and standard deviation of the day ending June 4. I subtracted the average from the total mail and determined what proportion of the standard deviation that resulted in (in other words, I obtained the z-score) and then converted to a percentile.
The results:
Friday – 82%
Monday – 97%
Tuesday – 93%
Monday – 91%
To interpret the results, for Friday at an 82% percentile, this means that only 18% of the time do the results vary more than they did on Friday, ie, normal daily noise exceeds Friday’s level less than 18% of the time. For Monday, it occurs less than 3% of the time (ie, less than chance). Tuesday and Wednesday the variance occurs less than 7% and 9% respectively, but not so much that it couldn’t have occurred by chance.
It looks like this botnet takedown is affecting spam levels, at least at this early stage, and at least on Monday.
Microsoft has just released a new search engine entitled Bing (but it's not Google). The philosophy behind it is something that Bill Gates has talked about for a long time. Today, Search returns a lot of results (regardless of whether it's Google, Yahoo or Live). However, it's too much information. There are too many results to go through and it can be frustrating to find what you want. Sometimes the first returned result is what you want, but many times it isn't and you have to hunt through pages of irrelevant links. Think back to how many times you've searched for something and the first result is a dead link.
The idea behind Bing is that it sifts through the irrelevancy to bring you stuff you actually want to see. It is designed to improve the user experience such that when you search for something, the results you get back are actually the results that have meaning to you. Less search with Search. The overall philosophy is sound.
Spam processing is similar. There is a lot of noise when looking for the signal. It is difficult to separate random spams and messages that aren't spam when attempting to find the stuff you actually want to find.
How do we Bing spam data?
A few months ago, I put up an ad on Craiglist advertising something of mine (I can’t remember what it was). It was easy, all I had to do was click on a few links, put in the description, fill in my email address and hit Post. No muss or fuss. As I recall, I do not believe I had to sign up for anything.
Fast forward a few months, and Craiglist has revised their security model – I would assume that they read my blog and have finally realized that if you give something away for free, people will abuse it. I posted something else and this time around, I couldn’t just post it without doing some complicated things.
- First, I had to sign up for an account. “Odd,” I said. “I don’t remember having to do this before.” So, I signed up for an account. But first, I had to pass the CAPTCHA. It was a bit of a tricky one. There were two words in the box and they were separated out by a large space, like the following:
strong mode
I said to myself “How do I fill this out? Do I type out ‘strongmode’ or ‘strong mode’ ? In other words, should I include the space or shouldn’t I? I didn’t, and the CAPTCHA passed. Step one complete.
- I thought that was it. I was wrong. In order to post something, I had to pass another CAPTCHA! This time, it was a phone verification. In order to verify my account, I had to enter in a code that I would receive by telephone. I entered in my phone number and they sent it to me via text message. I took the number, filled it in (manually typing it in as I don’t have copy-and-paste facilities from my phone to my laptop) and enabled my account.
- I thought that was it. I was still wrong! In order to post my ad, I had to fill in a third CAPTCHA! Again, it was two words with a space in between them. Having learned from last time, I did not include the space and the test passed. I got my ad posted online.
Craiglist has clearly implemented a bunch of new security measures that were not there in the past. My guess is that spammers will not take the time to do all of that stuff, particularly the phone test. The phone test was interesting; it’s something that I had heard talked about but didn’t think that anyone would actually do it in real life. Craigslist does.
So good for them. Hopefully this will cut down on the abuse that the site sees. And hey, how can I complain? The site is still free.
General News at Investors Business Daily
BY DONNA HOWELL
INVESTOR'S BUSINESS DAILY
Posted 5/29/2009
Our defenses are lacking and first-strike capability too, on a front once mainly depicted in science fiction.
"Cyberspace is real. And so are the risks that come with it . . . we're not as prepared as we should be," President Obama said Friday, creating a new White House position to coordinate protection of the nation's critical online systems.
Online threats are real and growing, experts agree. But while they applaud Friday's plans, they question how much good they will do.
Obama said many other steps will be needed. And the Pentagon plans to create a command for cyberspace defense and offensive capabilities, the New York Times reported.
Concrete Results Lacking
Neither move comes out of the blue. Obama's announcement follows a 60-day review and, as with the military, years of government efforts to shore up cybersecurity.
They've fallen short, Obama and others say, amid all kinds of computer attacks that threaten national security.
"While government can secure its own networks, our security is still at risk unless financial institutions and other institutions are also safeguarded," said Jim Walden, former head of the computer crimes unit in the U.S. Attorney's Office for the Eastern District of New York. "The private sector is lagging government and government has lagged the rest of the world."
The U.S. has sophisticated systems for protecting parts of its infrastructure but also plenty of vulnerabilities and a wide field of attackers, says Walden, now a law partner at Gibson Dunn.
"China and Russia have shown in a series of attacks against the U.S. and other countries their ability to wage different aspects of cyberwarfare," he said. "We need to show we're capable of defense but also acting proactively, on offense."
China was believed to be behind the "Titan Rain" attacks on defense agencies and firms a few years ago, he says. This year Beijing faced pressure over an online spying network, allegedly tied to it, said to have infiltrated systems in more than 100 countries.
But it's tough to tell if cyberattacks stem from governments or individuals. Russian cyberpunks launched massive attacks vs. Estonia and Georgia, but it's not clear what role the Kremlin played.
Obama said terrorists could unleash a "weapon of mass disruption" with a few keystrokes, and spoke on how attacks have darkened electric service in foreign cities. By some estimates, he said, Americans have lost $8 billion to cybercrime in the last two years.
The White House plans to work with industry on tech solutions. But Obama "was very clear" that he wouldn't "dictate to businesses how to fix their systems," said PGP Corp. CEO Phil Dunkelberger, head of industry group Tech-America's cybersecurity council.
"The consensus in the audience was that he's looking for a nonpolitical appointee well able to address economic and cyber issues," said Dunkelberger, who attended Friday's event.
Paul Kocher, president and chief scientist at Cryptography Research, has "sort of a secret hope we might actually get someone who's written code," but doesn't expect it.
He has "fairly low" expectations of what government can do.
"The main one is trying to coordinate the government's own security defenses, which are in a disastrous state," Kocher said.
PearlHarbor.com
Mandating safe codewriting and instituting severe liabilities for lax businesses are too unpalatable to be politically feasible, he says.
In that regard, there's "no possibility of something significant happening without a catalyzing event — a digital Pearl Harbor people describe sometimes," he said. "Eventually there will be something that happens sufficiently extreme" to spur interest in a cybersecurity regulator along the lines of the Federal Aviation Administration or FDA.
Turf wars between Homeland Security, the National Security Agency and others have emerged over who should take the lead in nonmilitary cybersecurity.
Walden hopes a White House coordinator will provide "a unified chain of command to help agencies put down their competitive rivalries and move toward a common goal."
Email was created by Man.
Spammers Evolved.
And Rebelled.
They clog and pollute the Internet.
Some have convinced themselves they are legitimate.
There are many spammers.
And they have a plan.
I've been working my way over the past few months through the re-imagined Battlestar Galactica TV series. Right now, the theme to the TV show is stuck in my head so I thought I'd do a post on a BSG-related theme.
A CAPTCHA is a Completely Automated Public Turing test to tell Computers and Humans Apart. It's what services like Microsoft, Google and Yahoo put on their free online signup pages when you sign up for a Gmail account, Live Spaces account, or Yahoo Groups account. The most common CAPTCHAs consist of squiggly text with lines stroked through them and then the person has to type the text into the box. The idea is that an automated bot cannot do the visual recognition necessary to type the text into the box, but a human can do it easily.
That's the general idea. It thwarts spammers but let's humans use the service legitimately. However, spammers have broken CAPTCHAs using two methods. In the first, they farm out cheap labor to solve the checks and get around that way. In the second, they actually have software that can do pattern recognition and type the correct text into the box. It doesn't work every time, maybe 1 out of 20, but given enough times and enough automation it is essentially the same as breaking the CAPTCHA. This is a problem for email providers to this very day.
Now consider Battlestar Galactica. The main villains in the series, the Cylons, have the ability to take human form. They are intent on destroying the human race. Early in the series, the humans figure out that Cylons have the ability to take on human form but unfortunately, they have no way of determining who is human and who is Cylon. They then commission Dr. Gaius Baltar, who was responsible for nearly getting the entire human race annihilated out of existence, to develop their own CAPTCHA - a Completely Automated Public Turing test to tell Cylons and Humans Apart.
Baltar creates a test but it turns out to be unreliable, or at least gives false signals (and to go through the entire human population testing for Cylons would take 18 years - too long to be useful). Ultimately, his test is unable to do determine in a quick, automated fashion who is a Cylon and who is a human. Thus, in effect, the Cylons have broken the CAPTCHA.
Now stay with me here. Right now, the problem of spammers is that they pollute the Internet and can even cause problems with national security if enough of them got together and targeted a country's infrastructure. But if -- somehow -- machines ever did gain consciousness and the security industry never figures out how to build a reliable CAPTCHA... and machines do rebel and attack us... we could be in serious trouble. Heck, we could all die via a nuclear sneak attack on our planet! After all, machines would surely study history and see that since spammers used the technique to great success to make money, then machines could use the technique to great success to eliminate humanity! Spammers are giving evil machines ideas.
It is no exaggeration when I say that spammers could be responsible for the downfall of humanity.

PS - file this one under humor.