Need overview about XRumer software?
There. Now let's wait a week for Google to index this, and see what the log drags in. Thank you very much, and I now return you to your regularly scheduled programming.
Once, many years ago, I did a foolish thing. I wrote a quick little spam filtration forwarder and opened it up to the public. As far as I can tell, I was the first person to have done so.
The year was 1999, and the Boom was in full swing. My plan was simple: (1) write a free online service, (2) ???, and (3) profit!!! As you no doubt can surmise, #2 never really happened, let alone #3. But from 1999 to sometime in 2005, I kept that thing running, through three servers, four household moves, and a growth in userbase from two to a few thousand (if I recall correctly).
Then: the server died. I mean, it died suddenly and irretrievably, and I (never much of a stickler for formalities) had not backed it up. Ever. I'd simply moved it from place to place while intending fully to back it up, and all the old machines were broken into their constituent materials by that time. As I was going through some serious financial woes, and as my wife and I had found that our son has a kidney disorder, my priorities were clear. Despammed.com was superfluous.
But it nagged at me. Despammed.com lived on in my heart, even though its DNS entries pointed to an IP now occupied by some calendar service thing. (Which was weird.) And then I found a not-too-dusty copy of the HTML. And last month I found relatively good copies of the filtration software. I still don't have the user database or the administration/registration code or the statistics or the filter databases or, really, anything. But what the heck. I put it back online anyway. I'm nearly positive I'm going to live to regret it.
But I still feel a warm holiday glow. I'm giving back to the community again. Merry Christmas to all of you! And if Despammed.com breaks into your house wanting to eat your brains, remember: a headshot is mandatory.
Over at Toonbots, I have a forum, based on ancient but reliable Perl code. For many years, that forum has been a quiet backwater of the Net where I chat on various topics with those of my friends who enjoy that facet of my personality responsible for the engendering of Toonbots.
But lately, something extremely irritating has happened. The forum has become the target of forum spammers. Their spam rarely even formats correctly, since the forum code is so old and weird. But that doesn't stop three or four of them from posting every day, and I have to delete it all by hand, or relinquish the forum to utter uselessness.
Oddly, the wftk forum is utterly unaffected by all this. Since the trouble started when I started properly indexing the forum archives, I suspect the archives are acting as a Google magnet for various topics. But I'm not sure yet.
The modus operandi of forum spammers is different from real posters, according to the logs; typically the forum spammer hits the site for the first time in the forum archives, then posts within a few seconds. Real posters actually read the site first. So I could filter based on that behavior. But I'm going to study the issue for a while, see if I can detect any other useful patterns. It's a serious problem, and a growing one; email spamming is experiencing diminishing returns now, since fewer people read email thanks to spam. So forum spamming is a logical progression.
I just wrote a rather effective spam eliminator for my WebBBS forum at Toonbots, and sort of "live blogged" the process as I went. The result is a rather attractive little document. I feel virtuous again tonight.
When I initially posted the XRUMER and you post, I thought that XRUMER probably used the text I posted (which I had found on a forum I frequent) to identify spammable fora -- those for which moderation is not performed.
Later, I came across the theory that this post was in fact some pretty clever viral marketing. By pretending to ask the forum's members about XRUMER, the XRUMER marketer could induce at least some people to search on it and link it, causing Google to rate it highly without actually themselves spamming. Neat.
But for whatever reason, my post caused Google to rate me third on searches on the term XRUMER -- and instead of XRUMER, I'm seeing a lot of traffic from people obviously interested in stopping it.
As am I.
But I don't have access to a forum affected by XRUMER (or at least, I can't tell for sure that I do.) My own Toonbots forum is an extremely low-traffic venue running on antiquated WebBBS code. I get spam there, and this week managed to block it all (so far), but my problem is decidedly minor.
I can only assume that if you're reading this, you have a major forum spam problem. If this is the case, I need your help. I'd like to try out some ideas about forum despamming -- building on the working concepts in my own low-traffic venue. But to try these ideas out, I'd need access to a forum. Your forum, if you're interested. And that essentially means access to the underlying storage (whether filesystem or database), a way to run Perl on your box, and access to the Web access logs in real time.
Depending on your own traffic patterns, the access logs can provide a great deal of information about whether a post is legitimate or not. Of course, you can also make a lot of valid judgments based on the post content, but I hesitate to block on things like "too many links," as satisfying as that heavy-handed approach may be. Legitimate users can often have legitimate reasons to post lots of links. Granted, they're generally not about Cialis or mortgages or hot xxxxxxx Asian lesbian pr0n, but still -- any interference with your actual users is something you want to avoid at all costs. I regard information about post content to be one factor in a good, well-rounded spam elimination strategy.
Traffic analysis correlated with forum activity can be a powerful tool, and in my own case it's working 100%, with no examination of content at all, but my traffic is so low that I can't judge how complete a strategy it might be. If you add your forum to the mix, I can improve the techniques.
So anyway, all you desperate forum admins with XRUMER problems -- if you want me to give it a shot, drop me a line. I'm working for free and during an initial phase my scripting can simply recommend post deletion instead of making any automated changes itself. Interested? Tell me.
So hey, kids, I'm still alive, and now posting from the lovely Caribbean island of Puerto Rico for the foreseeable future.
After the move, and after some confusion on the part of the cable company involving losing my order, I have blessed, blessed broadband again, without having to cadge the neighbors' WiFi from the rooftop terrace, which would be a great place to work were it not for the tropical proximity of a horrible huge ball of blazing nuclear explosion hanging over my head, plus the necessity of placing the laptop in a precarious position on the railing, four floors above concrete, to get good signal.
But now things are good again, and I have 9000 emails to go through (yes, as a guy with a spam filter, I should probably be filtering my spam, but, well, it's a long story and look, shiny thing!). And lo! within those 9000 mails were two from hapless forum operators who are getting fed up with manual despamming.
So sure, I'll be seeing what I can do in that regard, but it piqued my interest in forum spam again. And so I checked my logs for instances of XRumer, and wow -- somebody actually linked my XRumer blog keyword in response to ... a new instance of the XRumer forum bomb. Dated April 5, as it so happens. This one contains the novel text "Also, do you know when XRumer 4.0 Platinum Edition will be released?" and it's posted by AlexMrly. Google either the phrase or the name, and you'll see a whole lot of forum spam. Hey, XRumer guys -- thanks! What we all want is more forum spam!
Now I have that off my chest. I'm going to reiterate my offer to anybody listening -- I'm going to see what I can do to combat forum spam around the world, and I'm not charging anything for it. So far, I'm just in it for the interest, just like email spam in 1999. Get in touch. I'll be here. Well -- I might actually be at the beach. But I'll be back soon.
Sorry that this post isn't really all that programming-oriented. I hope to be making that right, in the next couple of days. Blocking XRumer is fun, and so easy even a child could do it! No, seriously: if you want to help me stop XRumer, all I need is your data.
Three related things today. First, the scripts I've been putting together for forum spam blocking have kind of coalesced into a "modbot". This program attempts to automate the tasks performed by human moderators and could technically be placed into any Web spam moderation situation. It is currently running happily in an Eboard 4.0 installation and blocking roughly 93% of spam, while still allowing anonymous posting to that forum. I'll be packaging it up for distribution in the public domain. Watch this space for further details -- one of the more fascinating notions I've had is to enable it to receive moderation emails from Blogger and thus automate the comment moderation process there.
One of the rules/tools used by the modbot is to count Google hits for the numeric IP of an untrusted poster. Turns out that HTTP proxies have a real proclivity for getting indexed. A lot. Legitimate IPs, not so much. I wrote a little online tool to call Google to get these counts; the tool is here and the write-up of the code is here. It's currently blocking about 40% of spam (I don't have good statistics analysis in place yet, so that's very approximate.)
Finally, as a spinoff of this project, I've started a spam archive. There's nothing to present yet, but I hope to start doing some interesting analysis, and most specifically a searchable database -- along with a searchable database of spamvertised sites. That ought to overlap with the sites spamvertised by email spam as well, and that's going to be an interesting thing to look at. We'll see.
Anyway, it's been nice talking to you. Back to work!
Now that I've been collecting spam from actual fora for a little while, I have some initial statistics and musings.
I've collected spam from one eBoard 4.0 forum since May 5; it is now May 13. The spam filters I'm using are blocking about 93% of the postings, making the moderation burden manageable for that forum. In those 8 days I have collected 1,235 spam samples. That's 150 spams a day, from a fairly obscure forum; in retrospect, even though the actual log activity seems low, this is a lot of spam.
Those 1,235 spam samples link to a total of 10,795 links. I haven't yet built analysis machinery to get much farther than that; I've mostly been just looking at the links, retrieving the pages, and musing about how all that might be automated in an interesting and useful way.
Some of the spam links point to actual sites being advertised. I don't yet have a feel for many links point to sites other than those actually advertised, but there are some interesting commonalities. For instance, there are a lot of pages placed onto vulnerable fora and other venues which simply link to other pages. In some cases, it's easy to tell why: Google spamming and simply a way to counter attempts to block posts which link to particular URLs.
I have a separate notion to find and track those vulnerable sites, and to attempt to mine them for further information on these spam networks.
One spam has a huge number of links to different domains, all of which resolve to the same IP. That's an interesting feature. I'm not sure how to track it yet. What I really want to do is some kind of generic analysis framework, but I don't have a good picture of what that framework would look like, or indeed precisely what it is that I expect it to do.
It seems that what I want to do is to build a kind of task list for an incoming event. That task list would consist of a certain (small) number of analysis steps which themselves generate new analysis events. Each step is a test. The results of the tests are cached, so that all possible duplicated effort is avoided, but also so that relationships such as "these spam efforts share an IP" can be found.
There's a certain exponential explosion involved, it seems at times. But there
are also patterns which could cut down on the amount of work done. Of those
10,795 links I have so far (oops, in the time it's taken to write this
much, two more spams have arrived, so I now have 10,886 links to analyze) --
of those 10,886 links I have, many of them are hosted at
Well, anyway, this is just a little talking out loud while I muse about how to automate all this analysis. Eventually I'll get down to posting graphs of some sort. That will be fun. The other thing, of course, is some way to ask about a URL, "Is this URL a spam indicator?" I hope it will also cross-fertilize with Despammed.com. Wish me luck.
I've always had a soft spot for good explanations of Internet sleuthing for fun and profit, and here's a dandy example.
I got a spam today saying the Beijing Olympics had been cancelled, so I was all "O hai, Botnet, I can has spamtrail?" (Because I hear the Russians are using fake news headlines to induce people to open the mail now. And part of this trail goes through Russia, as we'll see.)
The whole story (well, as much as I've followed and written down so far) is over here because it is really detailed. But it's fun so far, because not only is the main injection page obfuscated, it appears to be encrypted and the decryption code is itself obfuscated and located on a different server. In Russia.
So far, it's been instructive, as always when one unravels these threads. More later.
Turns out, a lot. Like, a lot. So I'm going to have plenty of grist for this mill -- and the very fascinating thing is that it sure looks like there is a change in tactics each day. So I'm going to try to go back through older instances and hope that people haven't fixed their servers yet for some, and I'm going to put up some early warnings to tell me about new ones -- but this is truly, truly fun.
Each of these mails has a faux news headline: "Michael Vick escapes from Federal jail", or "Beijing Olympics canceled", the one that first drew my attention. Then the body of the mail has a different headline, and a link.
Turns out that different headline is drawn from the same list. So I can check the Despammed.com spam archive (1.2 million spam emails on file at the moment) for other emails with that subject. And so on. This should allow me to build a database of subjects really, really easily. And then I can simply scan for those subjects to find new instances. If they select their headlines randomly (and I have no reason to believe they don't) this should allow me to find all their headlines and keep up with new ones at the same time. Fun!
Once I've got that coded, I'll post a database page in real time. [Updated to include link.] That will be even more fun. And then I can resume the de-obfuscation effort. Actually, I've dusted off some old project idea notes and started work on the monkeywrench to help me organize this stuff.
Note to anybody interested: the design philosophy of the monkeywrench is essentially a Hofstadter parallel terraced scan. But operated by a human (for now) in a workflow paradigm. I can sloooowly start to feel the various bits of my life coming together.
Over the past couple of days as I datamined the Despammed spam archives for Storm botnet spam, I've grown to really enjoy their madcap subjects (latest here). But today?
Guys! "Obama bribing countrymen" or "McCain picks Osama bin Laden as VP" are hilarious! But "Video News"? "Top stories"? Come on! If you're going to hijack a million people's machines to spam us all, the least you can do is to continue to be entertaining about it. This? This is beneath you.