wftk tutorial - (unresolved tag 02-a-title)


Data in the wftk is organized into lists. A list is equivalent to a key-addressed SQL table, more or less; each row must have a unique key, and different rows needn't have the same fields, but the SQL table is pretty close.

A list may simply be a representation of any arbitrary data structure, of course -- besides the obvious (SQL tables), we can also have lines in a file, files in a directory, arbitrary action results, the elements in an XML file, and so on. Part of the organizational work of setting up a wftk repository is to define the lists of data it will see, because those lists are its data environment.

The basic form of list is an arbitrary set of records stored in memory. So a good way to start getting our feet wet with the wftk is to look at the ways we can set up and work with memory lists. Since this is a toolkit, there are different approaches. The five basic approaches are the direct API against the session object, the string-based command-line interface provided by the shell (but which is also available for direct call), more specialized command-based methods for SQL (permitting quoting of parameters and the like), the regular DBI interface to allow existing code to talk to wftk repositories, and finally hash tying. In the spirit of making easy things easy, wftk tries to meet you halfway no matter how you conceptualize your data. We'll cover each of these five techniques in the subsections below -- but remember that in your own code there's absolutely no reason you can't mix and match any of these five basic access methods.

In later sections, we'll look at other types of storage for lists which will make our repositories much more useful, but the basic techniques we use for those lists will be the same, no matter where the data actually ends up being stored.

Direct list API against the session

So what can we do with a list? After simply defining one, we can add records, modify records, or delete records (add/mod/del), we can get a specific record by key, or we can iterate through the list. We can also get a text display of the list, which makes things easy to manage. Here's a demonstration of basic list functionality, using a simple memory list:

   use Workflow::wftk;

   my $wftk = Workflow::wftk->new();

   # Create a list:
   my $list = $wftk->open('mylist');   # Note: the open() method takes an optional
                                       # second parameter for list definition parameters.

   # Look at the fields in the list -- it has none;
   my @fields = $list->fields();       # Returns a reference to an empty list.

   # Add a record:
   my $rec = $list->get_new();         # This returns a Workflow::wftk::Record set up
                                       # for the list in question.

   $rec->put ('field1', 'value1');
   $rec->put ('field2', 'value2');

   my $key = $list->add ($rec);        # Easy!  The add() method returns the key of the
                                       # new record in the list.

   # Look at the fields in the list again:
   @fields = $list->fields();          # Returns ('field1', 'field2') now, because a memory
                                       # list just accretes the fields you give it.

   $rec = $list->get_new();            # Don't just recycle the existing record!
   $rec->put ('field1', 'different value');
   $rec->put ('field2', 'value2');
   my $key2 = $list->add ($rec);
   $rec = $list->get_new();
   $rec->put ('field1', 'third value');
   $rec->put ('field2', 'value2');
   my $key3 = $list->add ($rec);

   # Get a list of keys:
   my @keys = $list->keys();           # Returns a list of the keys stored in the list.
                                       # By default, they're sorted.

   # Iterate through the list:
   for (my $key = $list->first_key(); $key; $key = $list->next_key()) {
      # ...
   }

   # Get a record:
   $rec = $list->get($key2);

   # Get a value:
   print "Value is " . $rec->get('field2');

   # Modify an existing record.
   $rec->put ('field1', 'new value 1');
   $list->mod ($key2, $rec);

   # Delete a record.
   $list->del ($key3);

   # Display the contents of the list:
   print $list->as_text;

   # The print statement above will return the following:
   # +-----+-------------+--------+
   # | key | field1      | field2 |
   # +-----+-------------+--------+
   # | 1   | value1      | value2 |
   # | 2   | new value 1 | value2 |
   # +-----+-------------+--------+

That's the basics of how we work with lists through the direct API. At some later date, it might be interesting to come back to this point and build some more interesting examples. Drop me a line if you have any suggestions.

The other four methods of working with lists are basically all wrappers for this API, presenting it in various more convenient ways. Each of these sections presents the basic summary of how to do more or less the same things, but in five different ``dialects'' representing different ways of thinking about the same thing. The pervasive design philosophy of the wftk has always been to approach workflow and complex systems in as general a way as possible, and to formalize a number of different ways of thinking of the same thing. This makes it easy (in theory) to approach a problem from any point of view, and still have convenient tools for writing good code to solve it.

You'll be the final judge as to how well I've fulfilled that goal.

Command-line (shell) interface

The second means of working with lists (and everything else in the wftk) is the command-line shell. Working with the shell is often quicker and easier than the direct API, although it's not quite as easy to get data back out (of course, there's no reason not to mix and match).

You can also pass SQL commands to the command-line interface, and retrieve results through a dynamically-defined memory-stored list.

Here are the basic command-line equivalents for the actions taken in the last section using the direct API, plus a few more demonstrating some convenient features of the command-line interface.

   my $wftk = Workflow::wftk->new();

   # Create a list:
   $wftk->do('open mylist');           # Side effect: this sets the "current list" to mylist.

   my $curlist = $wftk->curlist();     # Returns the name of the current list.

   # We can still get an API handle for the new list:
   my $list = $wftk->list ();          # Returns a reference to mylist.

   # Or any other open list, by name:
   $list = $wftk->list ('mylist');     # Returns undef for a name it doesn't know.

   # (Did I mention we can get a list of the lists open in the session?)
   my %lists = $wftk->lists();
   print keys %lists;

   # We can drop the list, too.
   $wftk->do('drop mylist');
   $list = $wftk->list('mylist');      # Returns undef now.

   # Let's add it again, but this time with fields:
   $wftk->do('open mylist memory:fields=field1,field2');

   # Add a record using an SQL INSERT (this wouldn't work if we hadn't defined fields):
   $wftk->do("insert into mylist values ('value1', 'value2')");
                                       # Note: You'll probably want to use the SQL interface
                                       # for working with values directly, since it knows
                                       # how to quote values properly.
                                       # NO side effects from SQL commands except SELECT: 
                                       # the current list is unchanged.

   # Load a temporary record into the shell record cache for adding later:
   $wftk->new_record ();               # Opens a new record in the current list.  You can also name a list.
   $wftk->load_record (<<EOF);         # Loads data into the current record.
   field1: different value
   field2: value2
   EOF

   # Add that record from the shell record cache:
   $wftk->do ("add");                  # This tells the shell to add the cache record to the current
                                       # list.  You can also name a list, as shown below.

   # The 'add' command sets a current key in the shell.
   # Get the key of the current record:
   my $key2 = $wftk->curkey ();        # Returns '2' if all is well.

   # Do it again to get the same data into the list as with the direct API above.
   $wftk->new_record ();
   $wftk->load_record (<<EOF);
   field1: third value
   field2: value2
   EOF

   # Add that record from the shell record cache:
   $wftk->do ("add mylist");           # This tells the shell to add the cache record to the current
                                       # list.
   my $key3 = $wftk->curkey ();

   # Select some results into a result list:
   $wftk->do ("select distinct (field2) from mylist");

   $curlist = $wftk->curlist();        # $curlist is now 'result'.
   %lists = $wftk->lists();            # We now have two lists open.

   # Select a record:
   $wftk->do ("show mylist $key2");

   # Get a value:
   print "Value is " . $wftk->get('field2'); # This works because the "get" method against the
                                             # session works with the current record in the shell.

   # Modify the current record:
   $wftk->load_record (<<EOF);         # Note that missing fields aren't changed in the record.
   field1: new value 1
   EOF
   $wftk->do ("save");

   # Delete a record.
   $wftk->do ("del $key3");

The command-line interface can be convenient for quick coding, but its real value is in its capability to provide a shell for text-based interactive use. It's designed primarily for quick and dirty interaction with prototype repositories. The name-colon-value format for records, for instance, is designed to make it easy to edit records with vi in the filesystem; you're not expected to build those strings by script -- you're much better off using the SQL interface for most scripting.

SQL interface

For simple SQL actions and selects, there is a handy simplified SQL interface available. It can quote parameters in the DBI style, so you don't have to be careful with your values containing single quotes and the like, but it eliminates most of the proliferation of handles you need when working with the DBI. If you're comfortable with the handles, by all means just get the dbh from the wftk session and have at it -- but I, for one, always have to look everything up when working with DBI, and the SQL interface is designed to allow me to avoid that, and to express many common actions more concisely than DBI allows.

By now, the actions we're taking are going to seem a little boring, but I'll shake things up a little by introducing some of the special added features available through the SQL interface.

   use Workflow::wftk;

   my $wftk = Workflow::wftk->new();

   # Create a table:
   $wftk->execute_sql ("create table mylist (field1 varchar, field2 varchar)");

   # Insert some data.
   $wftk->execute_sql ("insert into mylist values (?, ?)", 'value1', 'value2');
   $wftk->execute_sql ("insert into mylist values (?, ?)", 'different value', 'value2');
   $wftk->execute_sql ("insert into mylist values (?, ?)", 'third value', 'value2');

   # Here's where it gets interesting.  Let's get the values in the first column
   # in order -- in one step.
   my @field1values = $wftk->sql_get_list ('select field1 from mylist order by field1');
   # That returns the list ('different value', 'third value', 'value1').  Tell me you don't like that.

   # How about a single value?
   my $count = $wftk->sql_get ('select count(*) from mylist where field2=?', 'value2'); # Returns 3, of course.
   
   # Select some other stuff.
   foreach ($wftk->sql_get_list ("select field1, field2 from mylist")) {
      my ($field1, $field2) = @$_;  # Our list entries are now themselves listrefs.
   }

   my $row = $wftk->sql_get ("select field1, field2 from mylist");
   # Returns an arrayref for the first row if multiple rows are called.

   $wftk->execute_sql ('delete from mylist where field1=?', 'third value');

DBI interface

The fourth way to talk to the wftk session is by asking for its internal DBI handle and using the usual DBI interface, which gives you the DBI toolset, including intelligent quoting of parameters, advance preparation of SQL commands, and the whole lot. I doubt it is much more efficient, but in terms of code organization it might well be, especially if you have scripts already written against the DBI interface to start with.

   use Workflow::wftk;

   my $wftk = Workflow::wftk->new();

   # Get the internal database handle for the wftk session:
   my $dbh = $wftk->dbh();

   # Now do everything just the way you would for any DBI database:
   my $sth;

   $sth = $dbh->prepare ("create table mylist (field1 varchar, field2 varchar)");
   $sth->execute();

   # If you don't need any special DBI stuff, you can call 'prepare' on the wftk
   # session too (this calls the same internal handle anyway; it's just convenient).

   $sth = $wftk->prepare ("insert into mylist values (?, ?)");
   $sth->execute('value1', 'value2');
   $sth->execute('different value', 'value2');
   $sth->execute('third value', 'value2');

   # Iterate through a list:
   $sth = $wftk->prepare ("select field1, field2 from mylist order by field1");
   my ($field1, $field2);
   $sth->execute();
   $sth->bind_columns (\$field1, \$field2);
   while ($sth->fetch()) {
      print "$field1: $field2\n";
   }

   # Modify a record:
   $sth = $wftk->prepare ("update mylist set field1=? where field1=?");
   $sth->execute ('new value 1', 'different value');

   # Delete a record:
   $sth = $wftk->prepare ("delete from mylist where field1=?");
   $sth->execute ('third value');

And with that, we should have the same list values as each of the above test runs.

Hash tying

Finally, you can also tie a list to a hash -- this code is written but not yet tested. It's conceptually a little confusing, since you can also tie the records in the list to a hash, and it's easy to confuse the two levels.

   use Workflow::wftk;
   my $wftk = Workflow::wftk->new();

   my $list = tie %list, 'Workflow::wftk::Data', $wftk, 'mylist';  # Creates mylist as a memory list.

   $list{1} = { field1 => 'value1', field2 => 'value2' };     # We need a hashref to name our fields.
   $list{2} = [ 'different value', 'value2' ];
   $list->add (['third value', 'value2']);                    # This accesses the $list object and generates a new key for us.

   print $list->as_text;    # Syntactic sugar is nice, isn't it?

   my ($field1, $field2) = $list{3}->get('field1', 'field2'); # Retrieves our third record.

   delete $list{3};                                           # Deletes the third record.

That's less impressive than it might be, because we haven't covered the hash tying in the record yet. There's a separate section on that later, so consult that for more extensive demonstration of hash tying.

Repository configuration I: permanently configuring a storage list

Up to this point, everything done in the example code has been done ``by hand'', with each list definition being specified when we open the list. It's much easier, however, to register definitions for lists in a repository in advance in a central definition. By default, the wftk looks for its configuration in a file named 'wftk.conf' in the repository's main directory, but we can also give it either a different filename or just pass it a configuration in a string passed when we initially open the repository session.

Once a list definition is already registered, the list doesn't need to be explicitly opened. You can implicitly open it simply by retrieving a handle to it or otherwise using it in some way. And if you also specify its datasource (either a file or a local file handle, or simply giving the data somewhere) then that data is imported the first time you refer to the list.

Parsing (and writing) of the configuration file is handled by Workflow::wftk::Record, and you can see a lot of detailed information about the format and what you can do with it in 02-h Records. The main thing you need to know at this point is that there are type-labeled sections of configuration data which are used to define lists and the fields within them. The syntax for that is fairly obvious from the example.

   use Workflow::wftk;

   my $wftk = Workflow::wftk->new(<<"   CONF");
    ! [mylist: list]
    ! type: memory:fields=field1,field2
    !
    ! [second_list: list]
    ! type: memory
    ! *fields: field1 field2 field3
    ! description: My second list
    !
   CONF

   %lists = $wftk->lists();      # Before opening anything, our list of lists
                                 # shows us what we have defined.

   $wftk->execute_sql ("insert into mylist values (?, ?)", 'some value', 'another value');
   $wftk->execute_sql ("insert into mylist values (?, ?)", 'something else', 'value');

   my @values = $wftk->sql_get_list ('select field1 from mylist order by field1');






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.