This kind of system starts small, but as the relationships between data sources and other entities in the system become better understood, I typically start writing various Perl scripts to take care of common questions. Then the Perl scripts sprout some reporting features which write small text files, and often I toss in some HTML writers as well to provide overviews of system activity. If things get sufficiently complex, I write some database code to do things. The whole thing usually runs by means of simple commands I run from the command line or install as cron jobs.
Ideally, the wftk would easily support and interface with this style of system without requiring me to reform my ugly coding habits. Frankly, my crufty code comes in small packages and represents observations of regularities in system data -- there's very little motivation for me to modify the way I work with it. I think this is a style of programming well-known to any system administrator.
So how would the wftk help?
But that shouldn't be taken as an assertion that the other points are inconsequential. Overview documents are typically ugly, poorly laid out, and lack Website navigation features (or are constantly outdated) simply because generating context-sensitive and pretty HTML in Perl scripts is sufficiently tedious that it's easy to procrastinate. So having some nice Perl-accessible functions like "take this object and format it according to this page definition" would be very useful.
To get a really good idea of spaghetti scripting, let's follow the development of spamtrap processing for Despammed.com, step by step.
After a few months, I got curious as to whether those accounts were still getting mail, and how often, so I included a simple logging feature in the filter for mails impinging on closed accounts. Sure enough, the accounts were getting more spam than ever, and it was all spamhaus spam (you know -- credit card offers, Gevalia coffee makers, exciting sweepstakes opportunities, that kind of thing.) Reasoning that the sources of all this junk were spamhausen (i.e. companies which exist only to send spam in large quantities), I concluded that I should identify the origin IP addresses and block them entirely, to avoid the inevitable filtration overhead when they sent to non-closed accounts; even if the spam is blocked, the filter program works hard to block it.
So I created a new "spamtrap" category for accounts, and wrote code to log mail received at spamtrap addresses to files identified by the origin IP of the mail, e.g. trap/209.23.33.49.log for example. On the seventh day, I rested.
? 2003-04-17 - 66.172.136.206 -- host: 1home206.letsroll-usa.net - 66.172.136.202 -- host: 1home202.letsroll-usa.net - 66.172.136.221 -- host: 1home221.letsroll-usa.net - 66.172.136.203 -- host: 1home203.letsroll-usa.net - 66.172.136.211 -- host: 1home211.letsroll-usa.net - 66.172.136.197 -- host: 1home197.letsroll-usa.net - 66.172.136.204 -- host: 1home204.letsroll-usa.net - 66.172.136.216 -- host: 1home216.letsroll-usa.net - 66.172.136.214 -- host: 1home214.letsroll-usa.net - 66.172.136.207 -- host: 1home207.letsroll-usa.net - 66.172.136.210 -- host: 1home210.letsroll-usa.net - 66.172.136.218 -- host: 1home218.letsroll-usa.net - 66.172.136.209 -- host: 1home209.letsroll-usa.net ? 2003-04-18 - 66.172.136.199 -- host: 1home199.letsroll-usa.net - 66.172.136.198 -- host: 1home198.letsroll-usa.net # Oy. ! whois letsroll-usa.net ? 2003-04-18 | Domain Name: LETSROLL-USA.NET | Registrar: INTERCOSMOS MEDIA GROUP, INC. D/B/A DIRECTNIC.COM | Whois Server: whois.directnic.com | Referral URL: http://www.directnic.com | Name Server: NS2.REMOVALPROCESS.COM | Name Server: NS1.REMOVALPROCESS.COM | Status: ACTIVE | Updated Date: 10-apr-2003 | Creation Date: 10-apr-2003 | Expiration Date: 10-apr-2004 | | |>>> Last update of whois database: Fri, 18 Apr 2003 05:54:17 EDT <<<Note that this file is somewhat idealized. The upshot, though: this file is organized line by line. Each line begins with a code character marked what it's for:
? | Date marker of IPs following, or answer following |
- | IP to block, followed by rDNS lookup of its canonical name, if any |
# | Comment line (I always build in comment lines, because I'm wordy and I like it that way.) |
! | Action taken and possibly to be taken on a regular basis. |
| | Result of such an action |
open IN, "infile.txt"; while (<IN>) { chomp; if (/^-/) { do something } elsif (/^\?/) { do something else } and so on }As an example, to make things easier on myself, I whipped out a little "check" script which scanned such a file, found '-' lines without rDNS info, did the DNS lookups, and filled them in on the appropriate lines. Then I built a "merge" script which scanned all block files for IPs, merged them into a single sendmail-style access file, copied said file to where sendmail could use it, and signalled a restart of sendmail to use the new blocklist.
The difference was astounding. I quickly identified several spamhausen which had been hammering the server for weeks, sending mail to all 250
spamtraps within a minute -- and I shut them all down. My initial stab was a success, although still far too manual to allow me to
use it often.