Keyword translation


One of the neat little things I did over the past few days was a simple Word macro -- at least, it should have been simple, but the problem is one I've had for a long time.

In this case, what I wanted to do was to fix up a few documents I had from a translation customer. This particular end user, for reasons known only to them, captions their figures using fields. The fields are in text boxes for easy positioning, and the field results (the text you see on the screen) are the captions.

Only one problem: the fields are always variable results for variables which don't exist in the document. All I can figure is that the document preparer makes these things in little snippets with some other tool which spits out Word texts, then they paste those into the text boxes.

So, you're asking now (unless you're a professional translator) who cares? You just type your English over the German in the captions, and you're home free, right? Well: no. Everybody who's anybody in the wonderful world of translation nowadays uses translation tools, in this case TRADOS.

TRADOS does two things for you: it stores each and every sentence you translate in a translation memory (a TM), so you (sort of) never need to translate anything twice, and it also makes it much easier to step through a document translating. The use of TRADOS makes translation much easier, and it also helps you stay consistent in your use of words and phrases.

Herein lies the problem: those fields were untouchable by TRADOS. There are two modes in TRADOS: one steps through the document using Word macros but doesn't deal well with text boxes (and yes, you'll note they're in text boxes). So that approach was out. The other (the TagEditor) converts the entire document to an XML format, then edits that in a very convenient way. The TagEditor makes short work of text boxes, but those field results were invisible to it.

Stuck! And so for a series of three jobs from that customer, I just didn't use TRADOS on the figure attachments, and hated it. Last week, though, I took screwdriver in hand (metaphorically speaking) and decided it was showdown time.

OK, that's the teaser -- follow the link to get the ... rest of the story.


So again, a lick and a promise for this blog as I madly try to finish some translation work. This translation job is an interesting one, though, as I mentioned, and as it turns out, amenable to editing in a specialized tool I just wrote today. Of course, having written the tool today means I have to use the tool this evening in a mad dash to finish, which in turn means I have no time to document the code until tomorrow at the very earliest.

Suffice it to say that the exercise was surprisingly easy. The task was simple: I need a tool to edit text files in which (for reasons we'll go into later) I have a number of phrases, one per line. The phrases mostly have all the right words in them, but not in the right order. I thus need a way to quickly select one or more words and drag it into the right place in the phrase. Sure, you say, Word does that. Yeah, except that Word doesn't put the spaces in the right place. God and Bill Gates alone know why, but Word doesn't put the fricking spaces in the right place when I drag words around on a line, and so I took matters into my own hands and rolled my own solution. And by God it works! Still a few little oddities in it, but it works more quickly than Word for this particular application.

Another nice thing it can do is this: when I drag the first word of a phrase out into the middle, it can decapitalize that word, and capitalize the new first word. That saves me a fraction of a second, and multiplied by 2000 phrases that adds up to a lot of time.

And another little thing I just now added: I can hit a key and toggle the case of the word the cursor is on. Again: this may or may not be of general use, but for this particular application it's very convenient. And that's really the idea of special-purpose text editors. An example from the programming world is emacs -- you can write LISP code to make emacs do literally anything at all (including psychoanalysis) from your text editor. The only problem being that it's too damn hard to start. Python's easier, at least for me. So a text editor in which you can embed your own Python snippets might be a generally useful tool indeed!

So. Tomorrow or Sunday, documentation and maybe some more movement on the drop tagger. And in the meantime, go get some sleep! (I know I won't any time soon.)



I just wanted to note at this juncture that my notion of running some very simple machine-translation code (yes, lovingly hand-coded in Perl) on a certain class of text, followed by human intervention using just the right kind of editor seems to be bearing fruit.

Granted, I would be in much less deadline trouble right now if I'd just done the Right Thing, shut up, and translated the text. But the text in question is not nicely flowing text. It is PLC message output written by engineers for machine operators, and it is dense. Very, very dense. So if I'd just translated it by hand I would have screwed up over and over.

Instead, I first scanned the entire text, broke it into words (with varying success), and looked up each and every word I didn't know. For those of you who aren't translators, this doesn't just mean not knowing a word at all; it includes not knowing what those particular engineers and machine operators intend to say with a particular technical term. This can be challenging, but in this case I had a lot of previously translated text, so I could look most words up in that.

Once all the words were "known" (ha), I ran the whole thing through a phrase scanner. Frequently occurring phrases were presented with word-by-word translations, along with some crude rewrite rules to make a better guess. This is all very, very naive, as any translator knows. It's not even as good as SYSTRAN, and SYSTRAN sucks.

But as I translated more of the frequent phrases, the system was able to string together better guesses for the longer phrases. At some point, then, I decided to switch over to direct translation of the actual segment list. This text was "nice" (in this one aspect alone) because segmentation was easy -- every line is a separate sentence, so there's no need to figure out where sentences might break. That's convenient.

At any rate, I am now using my specialized text editor to approve and/or modify each resulting phrase. Remember: all the words are already there, sort of, just not usually in an understandable order. Now that I can very quickly select and drag them around, though, my new translating technique is unstoppable. Ha! They said it couldn't be done! Those fools! MWAahahahaha!

(coff) OK, I'm better now. Documentation soon. I just felt enthusiastic.







Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.