Today's fun task was the creation of a little prototype code to format the tag cloud for the drop handler project. I did it in the context of this blog, and so first I had to get my keywords functional. I already had a database column for them, but it turned out my updater wasn't writing them to the database. So that was easy. Once I had keywords attached to my blog posts, I turned my attention to formatting them into keyword directories (the primary motivation for this was to make it possible to enable Technorati tagging, on which more later.) And then once that was done, I had all my keywords in a hash, so it occurred to me that I was most of the way towards implementing a tag cloud formatter anyway. Here's the Perl I wrote just to do the formatting. It's actually amazingly simple (of course) and you can peruse the up-to-the-minute result of its invocation in my blog scanner on the keywords page for this blog. Perl:
sub keyword_tagger { my $ct = shift @_; my $weight; my $font; my $sm = 70; my $lg = 200; my $del = $lg - $sm; my $ret = ''; foreach my $k (sort keys %kw_count) { $weight = $kw_count{$k} / $max_count; $font = sprintf ("%d", $sm + $del * $weight); $ret .= "<a href=\"/blog/kw/$k/\" style=\"font-size: $font%;\">$k</a>\n"; } return $ret; } This is generally not the way to structure a function call, because it works with global hashes, but y'know, I don't follow rules too well (and curse myself often, yes). The assumptions:
For our file cloud builder, we'll want to do this very same thing, but in Python (since that's our target language). But porting is cake, now that we know what we'll be porting. Thus concludes the sermon for today.
|
This is something I've wanted to do for a couple of weeks now -- I have a handy set of scripts to filter out chaff from my hit logs, and to grep them out to convenient category files (like "all interesting non-bot traffic to the blog"). So I've written a script to take all that blog traffic and determine which tag it should be attributed to. Hits to individual pages boost the traffic to all their tags. The resulting tag cloud is on the keyword tag cloud page next to the cloud weighted by posts. This is a really meaningful way to analyze blog traffic and get a feel for what people are actually finding interesting. A possible refinement might be to time-weight the hits so that more recent hits count for more weight (that would be pretty easy to do, actually -- even so cheesily as to count number of hits and multiply all the counts by 90% for every ten hits or something.)
The Perl code to read the logs and build the cloud file is below the fold.
|