wftk -- Process definition -- iteration

Topic: wftk -- Process definition: iteration

[ wftk home ] [ process definition ] [ discussion ]

The iteration construct is based on a presentation by Charles T. Zahn in 1974, which Donald Knuth described in a paper in the same year and republished in his book Literate Programming in 1992. It's a very general loop structure which covers "plain old" loops as well, so that we get a lot of freedom in building loops without having to write much code to interpret them.

The whole thing is based on the situation. The situation is basically a named break, and after the loop you write handlers which process the situations generated. This avoids the common problem of, say, a for loop with a break -- when you exit the loop, you commonly test the loop index to see whether it's within the bounds; if it is, you know the loop exited via the break, and not by simply running through the whole range of the index. This results in code that's hard to understand, sometimes. But it gets worse when you have two or more breaks. Then you have you declare and set a flag to determine which break was actually hit.

Well, this iteration construct pretty much does that for you. Here's how it works:
<sequence repeat> ... <situation name="sit1"></situation> ... </sequence> <handle situation="sit1">...</handle> First, notice that there is no special loop tag -- we don't need one. We simply tack a "repeat" qualifier onto a sequence or parallel tag, and that makes the block repeat until a situation occurs. The situation breaks out of the sequence, and is handled afterwards. We know exactly what the situation is, by the name used. We can have multiple breaks and we know which one was invoked. We don't need a declaration anywhere, and we're not cluttering our value sheet up with values that are used only internally.

So all we need are the situation and handle tags:
<situation name="name">...</situation> The contents of a situation tag are parameters to be used by the handler.
<handle situation="name">...</handle> And the contents of the handle tag are of course the code blocks to be executed when that situation applies.

The situation tag can be placed in a sequence or a parallel block. If it's in a sequence, it breaks immediately and no further sequential actions will be taken. If in a parallel, since all the actions are started simultaneously, we obviously can't stop processing things that are already started; so all the actions will run to completion, synchronize at the end of the block, and the block will then terminate and the handler will be invoked. Note that a non-repeating sequence can be terminated with a situation perfectly naturally, but a non-repeating parallel block with a situation will run in exactly the same way as one without a situation. The only difference is that you can use the situation as a flag, so that the appropriate handler will run.

If no handler is provided immediately following the block in which a situation occurs, then the parent block is terminated with that situation, and so on until the situation does have a handler. This provides a neat solution to the problem of how to break nested loops.

Something else we'll need for loops is an index variable. The index variable simply counts the number of times we've been around the loop, and we can name it whatever we like. The syntax for this is <sequence repeat index="i">, which makes an index variable named i. This is a pretty lightweight substitute for a for loop, but it should be enough.

Finally, I'm toying with the idea of providing nonbreaking situations. These would be analogous to flags and would provide a dandy way to flag completion of particular segments of a block. Since parallelism will make it possible for multiple situations to be activated at once anyway (thus making it necessary to maintain a list of situations active) I might as well exploit that mechanism. So <situation break=no> would signify that. Handlers should then be logically invoked in reverse order of situation creation. Thus the creation of a situation pushes the situation onto a stack.

The objection will probably be raised that this design is too complicated. However, it's really easier than having separate for and while loops, for instance, because there is less to parse and to keep track of. And we really do need iteration to do nearly anything outside of toy problems. The inclusion of parameters with the situation is more complicated than absolutely necessary, so the prototype probably won't have parameters on situations. It may turn out to be more than we need anyway. We'll see.

Bibliography and extraneous note
Knuth, Donald. Literate Programming. CLSI, Stanford, 1992.
Knuth, Donald. Structured Programming with go to statements. In Literate Programming but originally published in Computing Surveys 6 (December 1974), pp. 261-301.
Zahn, Charles T. A control structure for natural top-down structured programming presented at Symposium for Programming Languages, Paris, 1974.

It's interesting that work like this, done in the early 70's, was completely ignored by subsequent language development. The problem was that Algol and Pascal had already been developed. C was based on those, leaving no particular motivation to look at new structures. And then of course in the 80's, coming a little out of left field, object-oriented programming was the leading movement, and the programming structures available to the programmer actually writing the object methods was based on C, and again new constructs were uninteresting.

Read Knuth's article for an interesting exploration of different ways of handling the goto problem, especially in regards to loops.

Addendum (2/23/00)
As I was working on the e-commerce scenario it became obvious that this iteration construct was a little too simple, so I augmented it with two additional modifiers: foreach denotes a pseudovariable which takes on all the values in values, where values is a list construct. This allows, for instance, parallel iteration across rows of a query, which was what I needed for the e-commerce scenario. A simple numeric iteration was of no use to me there, and in fact I found that none of my scenarios needed numeric iteration. Go figure.

So the values modifier pretty much implies repeat, with the consequence that when the iteration runs out of values, it stops without any condition holding. However, open-ended iteration (repeat without values) will come in handy, so the repeat modifier will stay.

A values modifier without a foreach still makes sense, but there is no way to refer to the current value. You can also include both foreach and index modifiers; the index will simply maintain a count of the values used up so far.