Keyword traffic


This is something I've wanted to do for a couple of weeks now -- I have a handy set of scripts to filter out chaff from my hit logs, and to grep them out to convenient category files (like "all interesting non-bot traffic to the blog"). So I've written a script to take all that blog traffic and determine which tag it should be attributed to. Hits to individual pages boost the traffic to all their tags.

The resulting tag cloud is on the keyword tag cloud page next to the cloud weighted by posts. This is a really meaningful way to analyze blog traffic and get a feel for what people are actually finding interesting. A possible refinement might be to time-weight the hits so that more recent hits count for more weight (that would be pretty easy to do, actually -- even so cheesily as to count number of hits and multiply all the counts by 90% for every ten hits or something.)

The Perl code to read the logs and build the cloud file is below the fold.

2008-07-20 traffic

When I Wikiized the site, and started indexing the Wiki changes, I naturally also wanted to start looking at incoming traffic and referrers, as you can see on the "recent" page on the main menu. And of course I then started refining it to suit my tastes.

I had already had a "preproc.pl" script to preprocess the logs and give me the hits I want to see. That screens out spiders, everything I myself do from home, and (lately) any IP that posts spam to the forum or Wiki. The remainder is proving pretty interesting.

Normally, one can filter out search engine spiders based on their agent. But Microsoft, as always, follows their own rules (a little research on "search.live.com" and "QBHP" will show you plenty of griping.) They use a normal IE agent string, but mark their search queries using the "form".

And you know, normally I wouldn't care. But their search queries are weird. They consist of a single word, usually (but not always) one found on the page, and if you're actually paying attention to search queries to determine what it is about your site people find interesting, these won't help.

So now my preproc script blocks everything from the 65.55.*.* block with "form=QBHP" in the referrer. You just have to wonder what Microsoft is thinking, sometimes.






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.