Tag/file clouds

Today's fun task was the creation of a little prototype code to format the tag cloud for the drop handler project. I did it in the context of this blog, and so first I had to get my keywords functional. I already had a database column for them, but it turned out my updater wasn't writing them to the database. So that was easy.

Once I had keywords attached to my blog posts, I turned my attention to formatting them into keyword directories (the primary motivation for this was to make it possible to enable Technorati tagging, on which more later.) And then once that was done, I had all my keywords in a hash, so it occurred to me that I was most of the way towards implementing a tag cloud formatter anyway.

Here's the Perl I wrote just to do the formatting. It's actually amazingly simple (of course) and you can peruse the up-to-the-minute result of its invocation in my blog scanner on the keywords page for this blog. Perl:

sub keyword_tagger {
   my $ct = shift @_;
 
   my $weight;
   my $font;
   my $sm = 70;
   my $lg = 200;
   my $del = $lg - $sm;
   my $ret = '';
   foreach my $k (sort keys %kw_count) {
      $weight = $kw_count{$k} / $max_count;
      $font = sprintf ("%d", $sm + $del * $weight);
      $ret .= "<a href=\"/blog/kw/$k/\" style=\"font-size: $font%;\">$k</a>\n";
   }
 
   return $ret;
}

This is generally not the way to structure a function call, because it works with global hashes, but y'know, I don't follow rules too well (and curse myself often, yes). The assumptions:

  • The only argument passed is the maximum post count for all tags, determined by an earlier scan of the tags while writing their index pages.
  • $sm and $lg are effectively configuration; they determine the smallest and largest font sizes of the tag links (in percent).
  • The loop runs through the tags in alphabetical order; they are all assumed to be in the %kw_count global hash, which stores the number of posts associated with each tag (we build that while scanning the posts).
  • For every tag, we look at its post count in the %kw_count hash and split the difference in percentages between $sm and $lg -- then format the link with that font size. Obviously, this is a rather overly hardwired approach (the link should obviously be a configurable template) but as a prototype and for my own blogging management script, this works well.

For our file cloud builder, we'll want to do this very same thing, but in Python (since that's our target language). But porting is cake, now that we know what we'll be porting.

Thus concludes the sermon for today.






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.