Keyword dataparser

Malware analysis and generic file reading

2007-02-13 malware dataparser security programming

For many moons, I've had this crazy idea of a generic file parser floating around in my head. (The idea, not the parser.) This would function a lot like a hex editor, except that it would operate on a semantic level: if an extent in the file was known, meaning that its purpose was known or at least guessed at to the point where it could be named, then that information would be marked in a file description.

An example of this would be a malware analyzer. In case you haven't seen the term before, "malware" is software that is out to do you harm. Viruses, worms, and stuff like that. A popular source of malware is executables attached to email in such a way that Outlook will execute it without asking you. Yes, this still happens. For lots of details of this kind of exploit, see the Internet Storm Center's blog. Hours of fun reading there. No, seriously! Malware is fun!

But as any reader of this humble blog knows (both of you), my time for fun is strictly limited, and my patience wears thin very quickly. So I never actually analyze any malware, because to do so I'd probably have to find a piece of paper and note stuff down. Hence the need for software to do it for me: if I had something that defined the sections of an EXE file under Windows, for instance, then I'd run that against the malware, and I'd at least break everything down into conveniently readable chunks -- I'd eliminate the EXE header, split out the resources, that kind of thing.

This, then, is a generic file parser. It allows me to interactively define a file structure for a given file (or class of files) and read useful data out.

A more proximate reason to do this, lately, has been that I have a need to use a glossary file which has resisted import into MultiTerm, my glossary software of choice. I could open a hex editor and see the terms, but I couldn't do anything useful with them.

Well, yesterday I spent the whole day on it, but I have a prototype of said file parser. Using it, I can define hexblocks, sequences, lists, records, switchable sections based on flags, variable-length blocks based on length specifications in the file -- all that works like a charm, and I get a nice, readable dump file for my trouble. As I refine the file description, I get more readable dumps. And then I can write a Perl script to scan the dump and pull out whatever I like.

It is so extremely useful. Unfortunately, I only slept about three hours last night, since I stayed up until 3AM coding and didn't actually do the paying work I should have been doing instead... So posting this project will have to wait for another day -- but once it is posted, wouldn't it be groovy to have an interactive online file parsing tool for, say, malware snagged off the wild Net? That would be fun!

So: more later.