thingy.item(1) and thingy.item[second] would each
refer to the second item. If the items hadn't been empty, we could build a locator such as
thingy.item(1).stuff, which would refer to the first "stuff" elements in the
second item. (The default location is (0).)
So the following program embodies four actions: snip, replace, set, and insert. "Snip" simply
returns the contents of the location named. "Replace" replaces the location with whatever's
on stdin (no guarantee of well-formedness!) "Set" is used to set an attribute on the named
elements, and "insert" inserts the content of stdin either before or after the named element,
or before or after the named element's content.
A word of apology: although the four actions end up in four different executables, I have all
four being generated from the same source code, with #defines to determine which
of the four actions is to be taken. This makes for somewhat confusing code, but I hope that
the literate programming presentation will make up for it.
The program uses James Clark's expat XML parser, which I highly
recommend. It is documented in these pages in a literate style,
which I highly recommend. The actual tool I'm using to generate the documentation pages is
my tool LPML, which I can only recommend if you're really brave.
It's working for me, but then I'm writing it myself.
The motivation for the development is the open-source workflow
toolkit, which relies heavily on XML and expat. These tools will be used in the
datasheet manipulation portion of the whole system.
Here's how we do all this stuff:
[##itemlist##]
| This code and documentation are released under the terms of the GNU license. They are additionally copyright (c) 2000, Vivtek. All rights reserved except those explicitly granted under the terms of the GNU license. |
| This code and documentation are released under the terms of the GNU license. They are additionally copyright (c) 2000, Vivtek. All rights reserved except those explicitly granted under the terms of the GNU license. |
| xmlsnip | #define XMLSNIP | excerpts a named section |
| xmlreplace | #define XMLREPLACE |
replaces a named section |
| xmlset | #define XMLSET |
sets an attribute, otherwise not touching the file |
| xmlinsert | #define XMLINSERT |
inserts stdin somewhere in the tree |
print_usage() for readability. In the literate
presentation, there's no reason to do that, so I save a whole function call. Ha. You may
scoff at that, and in this case of course one piddling function call is trivial -- but the
whole concept of breaking functions into smaller functions for readability is obviated by
a literate style. OK, end of soapbox.
struct for storing information about the parse state. Effectively,
the current parse stack consists of those nodes between the root and whichever element we're
currently parsing. This stack is implemented as a doubly-linked list of FRAMEs.
Within the frame, we stash things like the name of the element at that position, its level in
the tree, and a linked list of TAG structures, which are used to count how many
of each variety of child elements have been encountered. (This allows us to use numeric position;
after all, item(2) refers to the third item encountered, which means that
we have to know how many items we've already seen each time an item comes along.)
So the second struct is the TAG struct, of course.
Note that none of our four actions requires keeping any more information on hand than the
current slice of the tree. That's one of the advantages of working with streams; the disadvantage
is of course that you're restricted in what you can do, and the whole shebang is harder to
understand.
malloc() -- but when I free a frame I need
to free its tag list as well.)
Here's how we free a frame:
empty flag on each tag; if the tag is still
empty, then endElement will print "/>" to close it; otherwise it will print the entire
close tag, name and all.
You can see that xmlinsert checks for whether it has anything to insert, before anything
else is done, and after the tag is closed. If the insertion is to an empty tag, then we
have to close the tag and mark it nonempty.
And then you notice that we have three different if statements -- xmlsnip only
emits the tag if it is in the snip location, xmlreplace only emits the tag if it's
not being replaced, and everybody else emits the tag if tags are being emitted. Since that
part is handled inside the if, the outer if is just an if
(1). In retrospect that's kind of an odd way to code that; I guess it shows that I
wrote xmlsnip first and then hacked it up to make the other tools.
#ifdef stuff, but hey. It works.
There are basically two things going on here. First, if our enclosing tag is still empty,
then we close it (as long as we're emitting tags, and we're inside our snip location or
we're not replacing the tag, same ol same ol on all that stuff.)
After that's done, we emit the character data, as long as we're supposed to be emitting
character data (and pretty much with the same caveats for xmlsnip and xmlreplace.)