Initial scan
[ Previous: Script file structure ]
[ Top: xmlp alpha ]
[ Next: Tangle: write code output ]
The intial scan of the input file is pretty straightforward. It simply reads everything and
builds a list of items, keyed by name and containing their labels and their text. The only
weird case you have to watch out for is when a piece concatenates to an item that hasn't been
encountered yet. In that case, the piece is stashed anyway, then when the item is
defined, if it has a text piece in it then that piece will be inserted before any text already
collected.
Due to the crude nature of my current weave, I have all this in one big blob of text. This is
because I can't bring myself to break it onto separate pages. And of course the other reason
for this is that I'm still not literate-adapted; I have always tended to write code in a rather
monolithic fashion (breaking code into subroutines to increase readability has always really
irritated me. I guess that's why I'm working on a literate programming system instead.)
One assumption I'm making here: the input file is open on INPUT. It will have to be rewound
before doing the weave. Tangle won't require a further pass, because this scan step will
gather everything we need for tangling. First, let's set up some globals we'll be using.
|
@items = ();
@objects = ();
$name = '';
$piecename = '';
$parentname = '';
|
The way I'm doing this is that I'm effectively using the name of the current item, and the
name of the item receiving the current piece, as state variables. When we leave the element
in each case, I reset the value to blank, so that the scanner can tell we're not in an item
or piece respectively.
|
while (<INPUT>)
{
if (/(<object .*>)/i)
{
$tag = $1;
$tag =~ s/^<object\s+//i;
$attr = "";
%thisobject = (name => '', language => '', item => '');
foreach $piece (split /"/, $tag) {
if ($attr eq '') {
$attr = $piece;
$attr =~ s/^\s*//;
$attr =~ s/\s*=\s*$//;
} else {
$thisobject{$attr} = $piece;
$attr = '';
}
}
if ($thisobject{name} eq '') {
print STDERR "$. : Nameless object encountered.\n";
next;
}
if ($thisobject{item} eq '') {
print STDERR "$. : Object '$thisobject{item}' has no starting item.\n";
next;
}
@objects = (@objects, $thisobject{name});
$starter{$thisobject{name}} = $thisobject{item};
}
|
The object scanner is a little simpler than the item and piece scanners, so I'll explain it
first. As each line is scanned, it's checked for being an <object>
tag.
Note that this is assuming that the tag will be the only thing on the line. I don't want to
get into real tokenizing of the XML input, because that will be the province of the QDMT, which
is my next four-letter vowelless acronym. The next version of XMLP will use the QDMT to tokenize
its input.
At any rate, if the object tag is encountered, I read its attributes into the
$thisobject
hash. Then I use that hash to build an object list, mark the starting
item for each object, and so on.
The pattern for items is similar, except that while in an <item>
tag,
I have the name of the tag in the $name
global.
|
if (/(<item .*>)/i)
{
$tag = $1;
$tag =~ s/^<item\s+//i;
$attr = "";
%thisitem = (name => '', label => '', pattern => '', language => '');
foreach $piece (split /"/, $tag) {
if ($attr eq '') {
$attr = $piece;
$attr =~ s/^\s*//;
$attr =~ s/\s*=\s*$//;
} else {
$thisitem{$attr} = $piece;
$attr = '';
}
}
if ($thisitem{name} eq '') {
print STDERR "$. : Nameless item encountered.\n";
next;
}
$name = $thisitem{name};
$lastchild{$name} = $name;
$children{$name} = 0;
if ($name !~ /\./) {
$parentname = '';
$parent{$name} = '';
} else {
$parentname = $name;
$parentname =~ s/\..*?$//;
$parent{$name} = $parentname;
$lastchild{$parentname} = $name;
$children{$parentname} += 1;
}
@items = (@items, $name);
if (defined $label{$name}) {
print STDERR "$. : Duplicate item name '$name'.\n";
}
if ($thisitem{label} eq '') { $thisitem{label} = $name; }
$label{$name} = $thisitem{label};
if ($parentname eq '') {
$url{$name} = "$name.html";
} else {
$n = $name;
$n =~ s/^.*?\.//;
$url{$name} = $url{$parentname} . '#' . $n;
}
next;
}
|
And then I terminate the <item>
tag by setting the $name
global to blank. I also set the $piecename
global to blank in case the user
forgot to terminate the current piece. I know that violates the principles of XML tokenization,
but again, QDMT will do real XML tokenization and I don't want to mess with it yet.
|
if (/(<\/item\s*>)/i) {
if ($name !~ /\./) { $parentname = $name; }
$name = '';
$piecename = '';
next;
}
|
And finally, the <piece>
tag, which is pretty analogous to
<item>
.
|
if (/(<piece.*>)/i)
{
next if $name eq ''; # Pieces are silent outside of items.
$tag = $1;
$tag =~ s/^<piece\s*//i;
$attr = "";
%thispiece = (add-to => '', language => '');
foreach $piece (split /"/, $tag) {
if ($attr eq '') {
$attr = $piece;
$attr =~ s/^\s*//;
$attr =~ s/\s*=\s*$//;
} else {
$thispiece{$attr} = $piece;
$attr = '';
}
}
$piecename = $name;
$piecename = $thispiece{'add-to'} if $thispiece{'add-to'} ne '';
next;
}
if (/(<\/piece\s*>)/i) {
$piecename = '';
next;
}
if ($piecename ne '') {
$pieces{$piecename} .= $_;
}
}
|
[ Previous: Script file structure ]
[ Top: xmlp alpha ]
[ Next: Tangle: write code output ]
This code and documentation are released under the terms of the GNU license. They are
additionally copyright (c) 2000, Vivtek. All rights reserved except those explicitly
granted under the terms of the GNU license.
|