Keyword programming

One of the neat little things I did over the past few days was a simple Word macro -- at least, it should have been simple, but the problem is one I've had for a long time.

In this case, what I wanted to do was to fix up a few documents I had from a translation customer. This particular end user, for reasons known only to them, captions their figures using fields. The fields are in text boxes for easy positioning, and the field results (the text you see on the screen) are the captions.

Only one problem: the fields are always variable results for variables which don't exist in the document. All I can figure is that the document preparer makes these things in little snippets with some other tool which spits out Word texts, then they paste those into the text boxes.

So, you're asking now (unless you're a professional translator) who cares? You just type your English over the German in the captions, and you're home free, right? Well: no. Everybody who's anybody in the wonderful world of translation nowadays uses translation tools, in this case TRADOS.

TRADOS does two things for you: it stores each and every sentence you translate in a translation memory (a TM), so you (sort of) never need to translate anything twice, and it also makes it much easier to step through a document translating. The use of TRADOS makes translation much easier, and it also helps you stay consistent in your use of words and phrases.

Herein lies the problem: those fields were untouchable by TRADOS. There are two modes in TRADOS: one steps through the document using Word macros but doesn't deal well with text boxes (and yes, you'll note they're in text boxes). So that approach was out. The other (the TagEditor) converts the entire document to an XML format, then edits that in a very convenient way. The TagEditor makes short work of text boxes, but those field results were invisible to it.

Stuck! And so for a series of three jobs from that customer, I just didn't use TRADOS on the figure attachments, and hated it. Last week, though, I took screwdriver in hand (metaphorically speaking) and decided it was showdown time.

OK, that's the teaser -- follow the link to get the ... rest of the story.

2006-11-30 programming art

I got wind of this wonderful, wonderful artist because I was obsessing on my hit logs due to starting this little blog (ahem, not like you haven't done that, be honest), and Xerexes at Comixpedia linked to this beauty of a site saying "Shades of the Toonbot" (aww, I've become a Concept. How cool.)

Anyway, my little scrivenings (hot dogs that they are) are nothing compared to the absolute jawdropping sheer beauty at Gallery of Computation | generative artifacts|Gallery of Computation | generative artifacts. You gotta see it.

The site is the turf of one Jared Tarbell, whose modus operandi is to write programs which express graphics. Pretty graphics. Really, really pretty graphics.

Well -- enough bubbling. Suffice it to say that I'd like to include a scripting engine into some version of the Toon-o-Matic which allows this kind of generative graphics. I doubt I'll ever get it that pretty, but still -- a man can dream.

Incidentally, note that this post's title contains parentheses. I'm probably revealing myself to be a complete fool, but let's just say that my blog weaving code choked on it because I was doing something really stupid with regular expressions. That may very well be the subject of a post soon. Or maybe I should keep my more egregious bugs under my hat. No, wait, those metaphors mix rather uncomfortably...

Did I mention that I'm not only going to be covering technical topics on this blog? Today's word, kids, is "Aquaponics".

Aquaculture is growing fish for food. Hydroponics is growing food (or other) plants in water or other non-soil rooting medium. Aquaponics is using the fish water as the hydroponic nutrient solution, which does two things for you: the plants filter the nutrients (ammonia is fish urine but plant ambrosia) out of the water, so they don't choke the fish but instead are converted into, say, lettuce; and the fish provide completely organic and relatively balanced set of nutrients for the plants. So the combination is superior than either together, which makes perfect sense if you consider that two smaller ecologies put together into one bigger one are necessarily more balanced and stable.

Anyway, that's our family's project this week. We had already wanted to grow lettuce indoors for the winter, and so instead of simply growing lettuce, we are growing lettuce floating in a styrofoam block on an aquarium. The aquarium will have goldfish, so we won't be eating that end of the system -- but it could just as well have tilapia in it. In fact, tilapia are great aquaculture fish because they'll essentially eat anything. If they don't eat it, it just feeds algae, and they eat the algae instead.

So after we level up with 25 gallons of goldfish tank, I am very seriously considering building a much larger tank in the backyard under a geodesic dome (per Organic Gardening of 1972) and growing me some serious tilapia. Did you know that in a round pool 12 feet in diameter and 3 feet deep (a small section of our backyard) you can harvest 500 half-pound tilapia every couple of months? No, neither did I until today.

Anyway, I see all this as related to programming. Both are simply the design of systems to meet needs. And in fact, I find the way I think about an aquaponics system is very similar to the way I think about a general data processing system. Where an aquaponics system outputs lettuce, a data processing system outputs some information I want. To make lettuce, I need to consider the nutrients and water and light; to make valuable information I need to consider the available raw data.

In either case, I find that a small, modular approach works well. In the case of aquaculture, it's a matter of considering what nutrients are where and what organisms can convert one thing to another; whereas in software, it's a matter of seeing simple data structures and designing lightweight tools that can convert one to another -- and then you organize all your little modules/organisms into an ecology.

Lately, there have been two (software) projects I've worked on in which this systems approach has worked well. The new Toon-o-Matic is composed of a number of small, relatively simple Perl scripts which are all organized by a Makefile. Each script reads one or two or three input data structures, and emits one or two. The overall network could be drawn as a graph (and indeed, that would be edifying and entertaining, and I should do that.)

The other such system is this blog. I've deliberately kept the approach simple and completely sui generis. I'm reinventing the wheel to a certain extent, but that's the attraction -- I like new wheels, and the occasional flaw doesn't bother me, as I always learn. Evolution doesn't mind reinventing the wheel -- did you know that the eye has evolved many completely separate times? The eyes of insects, vertebrates, and molluscs are three completely independent instances of the evolution of a visual sensor. And the eyes of molluscs (like octopi) are demonstrably superior to ours: our retinal nerves are in front of our retinae, thus each eye has a blind spot where the optic nerve penetrates the retina to leave the eyeball. Molluscs sensibly have their retinal nerves behind the retina: no blind spot. Another reason to believe in Intelligent Design -- just, you know, not of us. God loves the octopus, which is why global warming is going to provide the octopus with lots of shallow, warm seas with recently vacated cities in them.

Anyway, back on something resembling a track: my ultimate goal in the case of aquaculture is to close the ecological loop. I want to take my kitchen and garden waste, recycle it with vermiculture and composting, feed the worms and plants to tilapia, use the fish water for lettuce and seedlings and the worm castings for root vegetables, and ultimately I believe it may well be possible to feed my family fresh fish and veggies with not much more input than cardboad, grass clippings, and leaves, and whatever's on sale at Kroger.

My goal in the case of most data processing systems is less lofty: I simply want to model some useful process in small, easily maintained and easily modified steps, so that the system remains flexible and reliable. But in either case, the thought processes are similar: to attain a large goal, break it down into small, reusable task utilities.

I'll keep you posted on both.

There are two general ways to approach software design; each has its uses.

Top-down design looks at the entire project and breaks it into high-level components; those components are then subprojects and can be further handled in the same way.

Bottom-up design looks at the resources available and sees likely things that can be done with them; the idea is to provide generalized components to be used in any project.

A healthy software design ecology has a lot of bottom-up components at varying stages of maturity; those components then inform the top-down requirements of the current project, giving those designs something to work with. In the absense of complex components, we're forced to write everything from scratch, and it all turns into ad-hockery of the worst kind.

Anyway, that item of philosophy out of the way, I wanted to talk about the design of this week's project, the drop tagger. There are three main components of the drop tagger, as follows:

  • The drop handler
    The drop handler is the component which interacts with the shell and provides something you can drop files onto or otherwise tag them. It calls the file manager. However, the notion of a general drop handler is a much more interesting one than a special-purpose drop handler just for this project, and one which can be a valuable addition to many different file-oriented projects.
  • The file manager
    The file manager shows us what files have been dropped, allows us to add and delete them and modify their tags, and for fresh drops it will actively ask for tags. It also calls the tag cloud formatter and provides a convenient place to display the cloud.
  • The cloud formatter
    This is likely to be the least general and thus the least interesting of these components, but it formats the file cloud upon request based on information compiled about the tags in the system.

Each of these components can be designed and used in isolation, and reused in other projects. Alternatively, once we've defined the components we need to meet our goal, we may well be able to find ready-made components already available (or at least something we can adapt instead of starting from scratch). There is then a maturity effect over the course of multiple projects, as our codebase allows us to be faster and faster responding to the need for a project.

I'd like to formalize this design process over the course of several mini-projects. Stay tuned for further progress.

So my first actual weekly application is finished now. Aren't you proud? Suffice it to say that even a minor app takes a few hours to put together when you're reworking all your programming tools at the same time. A character flaw, I suppose. I never use an already-invented wheel if I have a perfectly good knife and wheel material. And I never use an already-invented knife if I have a perfectly good grinder and stock metal. And I never use an already-invented grinder if I have a lathe, motors, and a grindstone. And I never use an already-invented lathe... (sigh).

At any rate, it took me a few hours more than I wanted, but I'm reasonably pleased with the result. You can see the whole thing here (it's far too long to publish on the blog directly, of course). Go on. Look!

For many moons, I've had this crazy idea of a generic file parser floating around in my head. (The idea, not the parser.) This would function a lot like a hex editor, except that it would operate on a semantic level: if an extent in the file was known, meaning that its purpose was known or at least guessed at to the point where it could be named, then that information would be marked in a file description.

An example of this would be a malware analyzer. In case you haven't seen the term before, "malware" is software that is out to do you harm. Viruses, worms, and stuff like that. A popular source of malware is executables attached to email in such a way that Outlook will execute it without asking you. Yes, this still happens. For lots of details of this kind of exploit, see the Internet Storm Center's blog. Hours of fun reading there. No, seriously! Malware is fun!

But as any reader of this humble blog knows (both of you), my time for fun is strictly limited, and my patience wears thin very quickly. So I never actually analyze any malware, because to do so I'd probably have to find a piece of paper and note stuff down. Hence the need for software to do it for me: if I had something that defined the sections of an EXE file under Windows, for instance, then I'd run that against the malware, and I'd at least break everything down into conveniently readable chunks -- I'd eliminate the EXE header, split out the resources, that kind of thing.

This, then, is a generic file parser. It allows me to interactively define a file structure for a given file (or class of files) and read useful data out.

A more proximate reason to do this, lately, has been that I have a need to use a glossary file which has resisted import into MultiTerm, my glossary software of choice. I could open a hex editor and see the terms, but I couldn't do anything useful with them.

Well, yesterday I spent the whole day on it, but I have a prototype of said file parser. Using it, I can define hexblocks, sequences, lists, records, switchable sections based on flags, variable-length blocks based on length specifications in the file -- all that works like a charm, and I get a nice, readable dump file for my trouble. As I refine the file description, I get more readable dumps. And then I can write a Perl script to scan the dump and pull out whatever I like.

It is so extremely useful. Unfortunately, I only slept about three hours last night, since I stayed up until 3AM coding and didn't actually do the paying work I should have been doing instead... So posting this project will have to wait for another day -- but once it is posted, wouldn't it be groovy to have an interactive online file parsing tool for, say, malware snagged off the wild Net? That would be fun!

So: more later.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.