gdbm
gdbm is a GNU implementation of the standard Unix dbm library, originally developed at Berkeley. (Well, what wasn't?) You may have heard of dbm if you've worked at all with sendmail; sendmail uses dbm for many of its lookup tables.

gdbm's features may be summarized as follows:

  • Fast
    gdbm implements a filesystem-based hash table. It's extremely fast in comparison to relational databases, because most of the overhead (and the features) of an RDBMS simply aren't there.
  • Simple
    gdbm is very simple to include into your C programs.
  • Open-source
    gdbm is probably perfect. But in the event it's not, you can fix it yourself.

Thus gdbm is an excellent lightweight alternative to a full relational database. If all you need to do is to store arbitrary data to be looked up with keys, and do it fast, then gdbm is your solution. However, there are a few caveats.

  • One writer, many readers.
    Only one process may have a gdbm file open at a given time, and this is an exclusive lock: when a file is open for writing, no readers may open it. When the file is not open for writing, any number of readers may open it. So gdbm is really optimal only in the case where a file is written seldom and used for lookup, or when only one process needs to use the file anyway (and thus may open it for writing and reading simultaneously.) The latter case makes more sense than you might think (why wouldn't you just use a hash table?) because gdbm provides a persistent storage for the hash.
  • Multiple standards.
    Well, kind of. The problem is, in the dbm world, there is original dbm, "new" ndbm, and GNU gdbm. They all use different file formats, so to use an original dbm file with gdbm you have to convert it (there is a utility to do so.) gdbm provides a dbm compatibility mode -- but in that mode you can only open one file at a time because the functions don't include a file pointer!
  • Is data null-terminated or not?
    dbm files don't necessarily need to have their data terminated by 0 -- but since C is normally used to access them, they very often do. Since "key" may be different from "key0", if you're working with a dbm file used by a different program, you need to be aware of the convention used by that program. After reading the man pages for gdbm, you may be wondering how the heck the data is used, since it's not really documented. Keys as well as data are manipulated using type datum, which is simply a structure
    typedef struct {
       char *dptr;
       int  dsize;
    } datum;
    
    So if you're using null-terminated strings, then you can simply toss that dptr into any normal string function, but otherwise, if you need to handle the data as strings, the your best bet is to malloc a buffer of size dsize + 1, use strncpy to get the data into your buffer, and then explicitly write your null terminator in at the end. (There's a little more detail on my gdbm example code page.)
  • Memory management
    When gdbm fetches a datum from a file, it mallocs the buffer for you and returns the datum -- but that means, of course, that you have to free the buffer yourself! Failure to do so, as always, will work fine until you go into production and everybody's watching, at which point your process will crash and burn from the memory leaks.

    • GNU source directory
      Go get your own copy! Current version as of this writing is gdbm-1.8.0 but I haven't looked at it yet (1.7.3 is good enough for me, gol durn it.)
    • My gbdm API documentation
      I spent some time back in the prehistoric era HTML-izing the API documentation for gdbm.
    • Example code
      This is a very short example of gdbm code, with a little annotation, just to get you started.






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.