xml_read: Using expat to parse XML files into memory

Previous: In-memory XML data structures and functionality ] [ Top: wftk core index ] [ Next: The command stack and how to load it ]

The basic structure of the parser is identical to any expat application. We create the parser and pass in a pointer to the XML we're building up as the user data. We register the handlers for elements and for plain text, and we're not interested in anything else.

Then we simply throw pieces of the input stream at the parser until we're through with it. The handlers do all the work of creating and inserting XML pieces into the growing structure. If we encounter an error, we free all the stuff we've already done; otherwise we return the structure at the conclusion of the parse.
See Handling elements: startElement
See Handling elements: endElement
See Handling non-element data: charData

XML * xml_read (FILE * file)
   XML_Parser parser;
   char buf[BUFSIZ];
   int done;
   XML * ret;

   ret = NULL;
   parser = XML_ParserCreate(NULL);

   XML_SetUserData (parser, (void *) &ret);

   XML_SetElementHandler(parser, startElement, endElement);
   XML_SetCharacterDataHandler(parser, charData);

   done = 0;

   do {
      size_t len = fread(buf, 1, sizeof(buf), file);
      done = len < sizeof(buf);
      if (!XML_Parse(parser, buf, len, done)) {
         output ('E', "XML error: %s at line %d",
         xml_free (ret);
         return NULL;
   } while (!done);

   return (ret);

Handling elements: startElement
The startElement handler, then, does a great deal of the work of creating XML data structures. The userData parameter points to the immediate parent of the node being encountered. When we open a new node, we allocate the data structure and copy attributes, append the new node to its parent, then we set userData to point to the new node -- when the element closes, we move userData up the chain back to the parent.

In the case of an empty element, expat fortunately calls first the open handler, then the close handler, so whether we have an explicitly empty element or not doesn't matter.

It's astounding how much simpler this startElement is than the corresponding
handler in xmltools!
void startElement(void *userData, const char *name, const char **atts)
   XML ** parent;
   XML * element;

   element = xml_create (name);
   while (*atts) {
      xml_set(element, *atts++, *atts++);

   parent = (XML **) userData;
   if (*parent != NULL) xml_append (*parent, element);
   *parent = element;

Handling elements: endElement
At the close of the element, we just jump up the tree to the parent. If there is no parent, then we stay put. Thus if there are for some reason two root elements in the input, the structure won't reflect the input, but the first root element won't get stranded, either.
void endElement(void *userData, const char *name)
   XML ** element;

   element = (XML **) userData;
   if ((*element)->parent != NULL) *element = (*element)->parent;

Handling non-element data: charData
Character data is even easier. We just create a new text structure and append it onto the parent. End of story.
void charData (void *userData, const XML_Char *s, int len) {
   XML ** parent;

   parent = (XML **) userData;
   xml_append (*parent, xml_createtextlen ((char *) s, len));
Previous: In-memory XML data structures and functionality ] [ Top: wftk core index ] [ Next: The command stack and how to load it ]

This code and documentation are released under the terms of the GNU license. They are additionally copyright (c) 2000, Vivtek. All rights reserved except those explicitly granted under the terms of the GNU license.