Well, just in time for a wave of referral spam that is hitting my blog (mostly from http://www.ownsthis.com) I spent part of today writing a class that can consume the Movable Type Blacklist. The class will allow you to download this file from the server periodically (no more than once a day). I have written it such that anyone can integrate this into their .Net blogging package, or any other .Net program. I just checked this into the dasBlog 1.7 tree. The nice thing about this is that the Blacklist is maintained in real time, and you won't have to rely just on content filtering (the stuff that Scott did) but you'll get a pretty long and decent blacklist of bad sites. So far, in the past few hours I've gotten 100% of the referral spam and no false positives...
We are a few days away from releasing the final version of dasBlog 1.7. A very small number of folks have been running the bits over the weekend and as a result we've fixed a few bugs. A couple more days and we'll post the bits to SourceForge.
When that happens I'll post the MovableTypeBlacklist class. I've also considered writing an HttpModule to send these guys 404s, but didn't really think that was appropriate. The list is basically loaded into a long string, delimited by "|" and passed into a Regex to match a url. Interestingly enough, when I tried to Compile the Regex, my little console app balooned to 150 MB and it never quite finished running. Using a static Regex with the long static string I was able to execute matches in 0 - 10 milliseconds.
Here is a dump of the class:
6p.org.uk : TrueExecuted in : 20 milliseconds microsoft.com : FalseExecuted in : 0 milliseconds shahine.com : FalseExecuted in : 0 milliseconds flatbedshipping.com : TrueExecuted in : 0 milliseconds apply-to-green-card.org : TrueExecuted in : 0 milliseconds ownsthis.com : TrueExecuted in : 10 milliseconds
6p.org.uk : TrueExecuted in : 20 milliseconds
microsoft.com : FalseExecuted in : 0 milliseconds
shahine.com : FalseExecuted in : 0 milliseconds
flatbedshipping.com : TrueExecuted in : 0 milliseconds
apply-to-green-card.org : TrueExecuted in : 0 milliseconds
ownsthis.com : TrueExecuted in : 10 milliseconds