In the predictability department, one of my forum spam traps just pulled in an interesting post: yeah, it was posted (presumably) by XRumer and certainly fits the profile -- but it's advertising a crack of XRumer.
"Greate new XRumer4.0 platinum edition and crack DOWNLAUD".
I wondered how long that cash cow would last -- looks like about, what, November to April? Actually, it took longer than I expected.
In case you're wondering whether this is a good idea, well, given that you therefore think spamming is a valid business technique, then: sure, go ahead. Download a crack from Russians and give them control of your machine.
In related news, I have doubled the number of forum sites I am despamming. (If you're paying attention, that means, yes, I now have one that isn't my own site.) And I decided to try a notion that's really paid off in spades.
See, XRumer uses a vast database of known HTTP relays to post spam. This makes it much more difficult for human admins to block by IP -- since a single spammer may have hundreds of IPs available, how can you block?
Well -- unintended consequence time! Thanks to the explosion in use of these proxies, we now have a reliable way to find them out without human intervention at all. Count the number of times Google indexes an IP, and you have an incredibly effective way to determine whether it is on the list of known proxies used by spammers. Granted, you have the lag between the time it becomes a proxy and when Google starts indexing the references to it on forum posts around the world. But this one test for spam blocks about 60% or more of forum spam, sight unseen.
It won't last. But then again, neither will XRumer, not in its present form.
Just to help you out, I've provided a simple Google hit counter: go here and type in any phrase, not just an IP address, to see how many references to the phrase Google has indexed. When I've got a little more timeframe behind it, I'll even put in autorepeating queries of the good ones, with gnuplot graphs to show googlecount over time.
And of course, I'll be putting the code up; it's about ten lines of Perl -- the only reason it's that long is that it caches results in a database so repeated queries don't pound Google. Not that Google can't stand the pounding, but I don't really want a bunch of Perl script threads hanging around waiting on Net latency.
So, a common refrain lately: more later.