Three related things today. First, the scripts I've been putting together for forum spam blocking have kind of coalesced into a "modbot". This program attempts to automate the tasks performed by human moderators and could technically be placed into any Web spam moderation situation. It is currently running happily in an Eboard 4.0 installation and blocking roughly 93% of spam, while still allowing anonymous posting to that forum. I'll be packaging it up for distribution in the public domain. Watch this space for further details -- one of the more fascinating notions I've had is to enable it to receive moderation emails from Blogger and thus automate the comment moderation process there.
One of the rules/tools used by the modbot is to count Google hits for the numeric IP of an untrusted poster. Turns out that HTTP proxies have a real proclivity for getting indexed. A lot. Legitimate IPs, not so much. I wrote a little online tool to call Google to get these counts; the tool is here and the write-up of the code is here. It's currently blocking about 40% of spam (I don't have good statistics analysis in place yet, so that's very approximate.)
Finally, as a spinoff of this project, I've started a spam archive. There's nothing to present yet, but I hope to start doing some interesting analysis, and most specifically a searchable database -- along with a searchable database of spamvertised sites. That ought to overlap with the sites spamvertised by email spam as well, and that's going to be an interesting thing to look at. We'll see.
Anyway, it's been nice talking to you. Back to work!