Topic: wftk -- Process definition: <data>

wftk home ] [ process definition ] [ discussion ]
(2/23/00) This is basically a total rewrite after some thought and feedback from Thomas Fricke.

Data storage is an extremely central concept to the workflow, of course. As I get further into the design, it's becoming obvious that the value sheet is going to play a central role in defining each active process. Each variable is represented by some value in the value sheet. (And the value sheet is also the logical place to keep information about the state of the process, but that's a discussion for another time.)

A value is essentially a piece of text, because it's textually represented inside a tag in the value sheet. A type adapter knows how to impose certain semantics and perform certain operations on the textual value. Type adapters might include integer, money, time, date, string, pattern, or whatever. They might be parameterized, so that enumerations could be supported. There is quite a lot we can do with the adapter concept, without necessarily having to do it all at once up front.

We also need to be able to get to data which is stored elsewhere. So I am introducing the idea of a storage adapter which leaves status information in the value sheet (whatever it needs to get to the actual data). Storage adapters might include PostgreSQL, Sybase SQL Server, FileNet document management, the filesystem, the default (value sheet), ODBC, and so forth. We can easily imagine adapters written to take advantage of new storage techniques.

The point of storage adapters is that storage outside the database allows interaction at the data level with other processes and systems. This interaction goes both ways, so that we can read database information, store documents, etc.

OK. In addition to simple values, I think we need at least two aggregates: the record and the collection. And in fact I suspect we can get away with a single aggregate construct that allows us to define records or collections in a single construct. But first, let's look at the basic <data> tag.
  <data name="myvariable" type="integer" value="(literal value)"> </data>
That's the definition of a simple integer, stored in the value sheet. If we omit the "type" modifier, the default type is text, which is basically a string, unless the data tag has content, in which case it is automatically a structure. It occurs to me that certain type adapters would expect a structure underpinning, so inclusion of a type other than "structure" doesn't automatically precluding being a structure in fact.

First complication, then, would be structures.
  <data name="stuff" type="structure">
    <data name="field1" type="text" value="something"></data>
    <data name="list">
        <data type="integer">

See where I'm going with that? This defines a structure consisting of a named text field, and a named list of integers. The length of the list of integers is obvious if the storage medium is the XML value sheet: it's just the number of values in the list. And if some other storage adapter is used to stash this value, then that adapter will abstract away how the length of the list is stored. I really, really like this construct (probably because I've been mulling over a data description language for roughly five years now.)

The value of record fields can be accessed with the usual dotted notation, so that the text value above would be referred to as ${stuff.field1}.

I had considered taking liberties with our <sequence> tag here, since a list is the same sort of thing for data that a sequence is for actions. But first, the sequence vs. parallel distinction doesn't seem to make sense for data, and second, there are lots of features of the sequence tag that don't even make sense for data. So I introduced the <list> tag instead.

So a storage adapter will be able to represent records which are stored elsewhere. Ideally such storage units would be possible to nest, so that for instance a serialized version of a structure could be created using one storage adapter, then that serialized object be treated as some raw data to be stored somewhere using another storage adapter. This is obviously far out of scope of the original project, but it's a powerful idea. I hope I'll get back to it before the year's out.

If we have a common data structure that we know we'll be reusing, we can define it with the <datastructure> tag as follows:
  <datastructure name="myrecord">
  ...   <data name="myrecordvalue" type="structure:myrecord"></data>

So how do we do storage? Let's say something like this:
  <data name="something" storage="postgres://database spec/query string"></data>
As you can see, I think the area of storage specifications needs some work. But it's making more sense than it did yesterday.

In the case where a record or table is used from a relational database, the data structure is understood to be implicitly defined by the database system. The storage adapter for that database is responsible for translating the database's columns into usable typed data.

We're still not quite done with the data tag. We want some additional modifiers as well. First, the format modifier will specify a format to be used when converting the type to a string. The format will be type-specific, of course, and will be passed to the adapter unmodified. This is a little subtle, because the value is actually a string in storage, but that string needn't necessarily be human-readable. The format modifier allows us to specify a human-readable format to be used with this value.

Then we'll want a readonly attribute which will be useful when referencing data from within a task. Since the data tag is used both to declare and to manipulate values, its semantics is a little fuzzy. The "readonly" attribute is for references, so that the system knows to present this data to the user in connection with a task, but not elicit a change to it.

So this data stuff looks like it's coming together.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.