Repository manager: Data retrieval and display

Data retrieval and display

[ Previous: Working with individual objects as reports ] [ Top: Repository manager ] [ Next: User authentication and group membership ]

The retrieval of an object, although it seems simple, can be anything but. The core of a given object is a record from a list, but that core may be augmented with information from other storage or lists in different ways: sublists are specified using the <link> tag and their definition is still a moving target.

In general, retrieval is performed on a list given the unique key of a record. In the wftk, each record always has a unique key (unlike general SQL). In the case that no key is supplied (NULL or "") then a blank record will be created from the list definition. The special key "!cur" can be used to retrieve the "current" record (TODO: implement this).

Once the record is retrieved, various attributes are set in order to make things easier elsewhere in the code; these attributes are (often) cleaned up during storage. (This was added May 13, 2002).

Sublists are lists of subrecords within a record. Since this feature is not a part of SQL, it can sometimes be a little confusing, but effectively a sublist is a link to other information. The sublist can be defined in the list defition using a "link" tag; this element name makes sense, because the records in a sublist can link to content in other lists.

These foreign links come in (at present) two varieties. The first, the independent sublist was added March 1, 2003, and is external storage of the entire sublist. That is, nothing is stored in the main object (which means it can be stored in an SQL table), and the sublist data is retrieved from some external storage after the main object is retrieved. This allows a fields-and-lists structure using an SQL database, including the case where a single related record is retrieved from some other list. This feature is specified as follows:

<link storage="delim:[key]-stuff.txt" rec="stuff"/>

<link list="other_list" rec="stuff"/>

This variant retrieves all the tab-delimited records in the sublist (after substituting [key] from the record) and creates a "stuff" element for each in the main object. If an "id" attribute is added, then the main object will contain a "link" element which in turn contains all the "stuff" elements. Either variant can be read in xmlobj_get using e.g. "stuff[0].myfield" to get "myfield" from the first rec entry. (December 27, 2004) - Typing all this up, I realize it's really not terribly useful, because it doesn't provide a link-on attribute. I'm building a far more complicated system now, so this will have to change as well. TODO: do this.

The other possibility, the dependent sublist, is a hybrid between a simple sublist (with no relationship to an external datasource) and a pure link. I'm implementing this as of December 27-29, 2004. In this variant, there is a sublist stored explicitly in the object, but each item in that sublist references an object in other list. (This variant doesn't refer to raw external storage locations, just other lists.) When the object is retrieved, the references are also retrieved in list format and the resulting records are integrated one by one with the sublist items. When the object is stored, those external references are removed again. During the loading process, new items can be added to the list or an assertion can be checked to discard old items from the list, and these steps can vary depending on rules laid out as a decision structure.

The decision structure is a "load-rules" element in the list definition. After the main object is loaded, any load-rules are found and collapsed using the main object. The decided definition is then checked for any link information, and links are processed based on whatever results.

(June 1, 2003): There is a special handler for the retrieval of tasks.

static XML * _repos_get_pageobj (XML * repository, const char * key);
static void _repos_get_sublist (XML * repository, XML * obj, XML * list, XML * sublist);
static void _repos_load_links (XML * repository, XML * obj, XML * list, XML * linked);
static XML * _repos_get_task (XML * repository, const char * key);
WFTK_EXPORT XML * repos_get     (XML * repository, const char * list_id, const char * key)
{
   WFTK_ADAPTOR * ad;
   XML * list;
   XML * ret;
   const char * line;
   const char * end;
   XML * field;
   XML * mark;
   XML * rule;
   struct _repos_remote * sock = (struct _repos_remote *) xml_getbin (repository);

   if (sock) { /* Remote. */
      if (key) {
         xml_setf (sock->parms, "outgoing", "get %s %s\n", list_id, key);
      } else {
         xml_setf (sock->parms, "outgoing", "get %s\n", list_id);
      }
      _repos_send (sock);
      line = _repos_receive (sock);
      if (*line == '-') return NULL;
      line = strchr (line, '\n') + 1;
      list = xml_create ("t");
      xml_set (list, "r", "");
      while (line[0] != '>' || line[1] != '>') {
         end = strchr (line, '\n');
         if (end) {
            xml_attrncat (list, "r", line, end - line + 1);
         } else {
            xml_attrcat (list, "r", line);
            break;
         }
         line = end + 1;
      }
      ret = xml_parse (xml_attrval (list, "r"));
      xml_free (list);
      xml_set (sock->parms, "buffer", "");
      xml_set (ret, "list", list_id);
      xml_set (ret, "key", key);
      return ret;
   }

   /* Handle selected pseudolists first. */
   if (!strcmp (list_id, "_lists")) {
   } else if (!strcmp (list_id, "_pages")) {
      return _repos_get_pageobj (repository, key);
   } else if (!strcmp (list_id, "_todo") || !strcmp (list_id, "_tasks")) {
      return _repos_get_task (repository, key);
   }

   /* Find the list named. */
   list = repos_defn (repository, list_id);
   if (!list) return NULL;

   /* Handle immediate-addressing mode. */
   if (!strcmp (xml_attrval (list, "storage"), "here")) {
      ret = xml_search (list, NULL, "id", key);
      return (xml_copy (ret));
   }

   /* Handle normal case: key given, use adaptor to retrieve record. */
   if (key && *key) {
      ad = wftk_get_adaptor (repository, LIST, *xml_attrval (list, "id") ? xml_attrval (list, "id") : xml_attrval (list, "storage"));
      if (!ad) return NULL;
      xml_set (ad->parms, "basedir", xml_attrval (repository, "basedir"));
      ret = wftk_call_adaptor (ad, "get", list, key);
      wftk_free_adaptor (repository, ad);
   } else { /* NULL-key case (build "blank" record) */
      ret = xml_create (*xml_attrval (list, "rec") ? xml_attrval (list, "rec") : "rec");

      mark = xml_firstelem (list);
      while (mark) {
         if (xml_is (mark, "field") || xml_is (mark, "link")) {
            field = xml_create (xml_name (mark));
            xml_set (field, "id", xml_attrval (mark, "id"));
            if (*xml_attrval (mark, "type"))    xml_set (field, "type", xml_attrval (mark, "type"));
            if (*xml_attrval (mark, "storage")) xml_set (field, "storage", xml_attrval (mark, "storage"));
            xml_append (field, xml_createtext (xml_attrval (mark, "default")));
            xml_append_pretty (ret, field);
         }
         mark = xml_nextelem (mark);
      }
   }

   /* Handle state, and make sure our marker attributes are set properly. */
   field = xml_search (list, "field", "type", "state");
   if (field) xml_set_nodup (ret, "state", xmlobj_get (ret, list, xml_attrval (field, "id")));
   xml_set (ret, "list", list_id);
   xml_set (ret, "key", key);

   /* Copy list definition and collapse any load-rules definitions based on the main object just loaded. */
   list = xml_copy (list);
   while (mark = xml_search (list, "load-rules", NULL, NULL)) {
      rule = wftk_decide (repository, ret, mark);
      xml_copyinto (xml_parent (mark), rule);
      xml_delete (mark);
   }

   /* Handle linked data in alternate storage. */
   field = xml_firstelem (list);
   while (field) {
      if (xml_is (field, "link")) {
         if (*xml_attrval (field, "storage") || *xml_attrval (field, "list")) {
            _repos_get_sublist (repository, ret, list, field);
         } else if (*xml_attrval (field, "link-to")) {
            _repos_load_links (repository, ret, list, field);
         }
      }
      field = xml_nextelem (field);
   }

   xml_free (list);
   return ret;
}

The _repos_get_sublist function handles independent sublists. It uses a little helper function to express the definitional attributes; this is reused for _repos_load_link below.

The logic for this function is a little weird (at least it wasn't apparent to me after leaving it for a few months). All it does is set up the sublist structure in the main object, then call repos_list on it to let the repository manager load the results.

static void _repos_sublist_express_attrs (XML * defn, XML * obj, XML * list) {
   XML_ATTR * attr;
   const char * val;

   /* Attribute values are expressed in terms of parent object.  TODO: what about parent's parent? */
   attr = xml_attrfirst (defn);
   while (attr) {
      val = xml_attrvalue (attr);
      if (strchr (val, '[')) xml_set_nodup (defn, xml_attrname (attr), xmlobj_format (obj, list, xml_attrvalue (attr)));
      attr = xml_attrnext (attr);
   }
}
static void  _repos_get_sublist (XML * repository, XML * obj, XML * list, XML * sublist) {
   XML * linkto;
   XML * link_into;
   XML * rec;
   XML * field;
   XML * linked = sublist; /* TODO: I'm pretty sure this isn't what I had intended. */

   _repos_sublist_express_attrs (linked, obj, list);

   xml_set (linked, "rec", *xml_attrval (sublist, "rec") ? xml_attrval (sublist, "rec") : "rec");

   link_into = obj;
   if (*xml_attrval (sublist, "id")) {
      linkto = xml_create ("link");
      xml_set (linkto, "id", xml_attrval (sublist, "id"));
      xml_append_pretty (obj, linkto);
      link_into = linkto;
      xml_set (linked, "rec", *xml_attrval (sublist, "rec") ? xml_attrval (sublist, "rec") : "link-to");
   }

   if (*xml_attrval (linked, "list")) {
      xml_set (list, "id", xml_attrval (linked, "storage"));
   }

   if (repos_list (repository, linked)) {
      linkto = xml_firstelem (linked);
      while (linkto) {
         rec = xml_copy (linkto);
      
         /* Handle linked data in alternate storage. */
         field = xml_firstelem (sublist);
         while (field) {
            if (xml_is (field, "link") && (*xml_attrval (field, "storage") || *xml_attrval (field, "list"))) {
               _repos_get_sublist (repository, rec, sublist, field);
            }
            field = xml_nextelem (field);
         }

         xml_append_pretty (link_into, xml_copy (linkto));
         linkto = xml_nextelem (linkto);
      }
   }
}

The _repos_load_link function handles dependent sublists; it finds the main object's list, retrieves a list of keys, gets the dependent objects, and merges them. To do this, we're going to do something again which is a little weird. We know that the "linked" structure, which defines the link, is a copy of the actual definition, so we can modify it as much as we want. Then it'll get cleaned up with the rest of the copied definition after the load is finished.

The main mechanism for dependent sublists is to retrieve one linked record for each entry in the sublist in the main object, but we also have two definitional attributes which control the records actually stored in the sublist. The "link-addition" attribute gives an SQL where clause which is used to retrieve records from the linked list; if there are records in that list which don't have matching sublist entries, then entries are created to merge them with. Second, the "link-assertion" is another SQL where clause which is used to test all sublist entries after the merge is performed, and then to delete any entries which don't fulfill the specification.

static void _repos_load_links (XML * repository, XML * obj, XML * list, XML * linked) {
   XML * linkto;
   XML * sublist;

   _repos_sublist_express_attrs (linked, obj, list);

   
}

The _repos_get_pageobj function retrieves a page as an object; since this is built into the repository definition itself, it's a special case. TODO: storage of page objects.

static XML * _repos_get_pageobj (XML * repository, const char * key)
{
   XML * list;
   XML * ret;
   XML * field;
   XML * mark;
   XML * page;
   XML * layout;
   FILE * file;
   char line[1024];
   int bytes;

   page = xml_search (repository, "page", "id", key);
   if (!page) return NULL;

   layout = repos_get_layout (repository, xml_attrval (page, "layout"));

   /* We scan the layout in order to find page-specific textual values (i.e. Wiki text).
      If the page is an object page, these might not exist, so we may be restricted to just
      putting the title and such in the editable object.  We can get arbitrarily clever with this.... */
   ret = xml_create ("record");
   xml_set (ret, "textbase", *xml_attrval (repository, "text") ? xml_attrval (repository, "text") : "opmtext");
   xml_set (ret, "list", "_pages");
   xml_set (ret, "key", key);
   xmlobj_set (ret, NULL, "id", xml_attrval (page, "id"));
   mark = xml_parent (page);
   if (xml_is (mark, "page")) {
      xmlobj_set (ret, NULL, "parent", xml_attrval (mark, "id"));
   } else {
      xmlobj_set (ret, NULL, "parent", "");
   }
   xmlobj_set (ret, NULL, "title", xml_attrval (page, "title"));
   xmlobj_set (ret, NULL, "layout", xml_attrval (page, "layout"));

   field = xml_search (layout, "template:value", NULL, NULL);
   while (field) {
      mark = xml_search (ret, "field", "id", xml_attrval (field, "id"));
      if (!mark) {
         /* Here we might need to get *really* clever.  Later, though.  TODO: same. */
         xmlobj_set (ret, NULL, xml_attrval(field, "id"), "");
         mark = xmlobj_field (ret, NULL, xml_attrval (field, "id"));
         xml_setf (ret, "filename", "%s/%s_%s.html", xml_attrval (ret, "textbase"), key, xml_attrval (field, "id"));
         file = _repos_fopen (repository, xml_attrval (ret, "filename"), "r");
         if (!file) {
            xml_setf (ret, "filename", "%s/%s_%s.htm", xml_attrval (ret, "textbase"), key, xml_attrval (field, "id"));
            file = _repos_fopen (repository, xml_attrval (ret, "filename"), "r");
         }
         if (!file) {
            xml_setf (ret, "filename", "%s/%s_%s.wiki", xml_attrval (ret, "textbase"), key, xml_attrval (field, "id"));
            file = _repos_fopen (repository, xml_attrval (ret, "filename"), "r");
         }
         if (!file) {
            xml_setf (ret, "filename", "%s/%s_%s.txt", xml_attrval (ret, "textbase"), key, xml_attrval (field, "id"));
            file = _repos_fopen (repository, xml_attrval (ret, "filename"), "r");
         }
         if (!file) {
            xml_unset (ret, "filename");
         } else {
            while (bytes = fread (line, 1, 1024, file)) {
              xml_textncat (mark, line, bytes);
            }
            fclose (file);
         }
      }
      field = xml_search_next (layout, field, "template:value", NULL, NULL);
   }

   return ret;
}

June 1, 2003: Getting a task object is complicated by the fact that data about tasks comes in three flavors. First, there is the administrative data about the task itself: its key, label, assigned role/user, and so forth. These fields are common to all tasks, and the task object returned by repos_get puts them into xml_set values. Second, there are fields inherited by the task from its parent object (assuming it has a parent object.) These are specified by <data> elements either in the procdef definition of the task (if a procdef), the ad-hoc workflow definition of the task (if defined by ad-hoc workflow), or in the task object itself. And third, there are custom values belonging to the task object itself which take advantage of the fact that task objects are first-class objects in the system; these might be simple values, lists, attachments, whatever.

October 15, 2003: If we have a multipart key (i.e. a taskindex key instead of the raw task object key) we have to look to the index first to retrieve the proper key to get the real task object.

static XML * _repos_get_task (XML * repository, const char * key)
{
   XML * tasks_list;
   XML * repmgr_obj;
   XML * mark;
   char * _key;
   WFTK_ADAPTOR * ad;
   XML * wftk_obj;
   XML * ret;

   /* 0. Look aside to the task index if given multipart key. */
   repos_log (repository, 5, 2, NULL, "repmgr", "Getting task %s", key);
   if (strchr (key, '~')) {
      mark = repos_get (repository, "_taskindex", key);
      if (!mark) return NULL;
      _key = xmlobj_get (mark, NULL, "key");
      repos_log (repository, 5, 2, NULL, "repmgr", "Which is actually task %s", _key);
      xml_free (mark);
   } else {
      _key = strdup (key);
   }

   /* 1. Get repmgr task object for values in first and third categories. */
   tasks_list = repos_defn (repository, "_tasks");
   ad = wftk_get_adaptor (repository, LIST, "_tasks");
   if (!ad) return NULL;
   xml_set (ad->parms, "basedir", xml_attrval (repository, "basedir"));

   repmgr_obj = wftk_call_adaptor (ad, "get", tasks_list, _key);
   wftk_free_adaptor (repository, ad);

   if (!repmgr_obj) {
      free (_key);
      return NULL;
   }

   /* 2. Ask wftk core for workflow-specific task information (values in the second category.) */
   wftk_obj = xml_create ("task");

   xml_set_nodup (repmgr_obj, "parent-list", xmlobj_get (repmgr_obj, NULL, "list"));
   xml_set_nodup (repmgr_obj, "parent-key",  xmlobj_get (repmgr_obj, NULL, "obj"));
   xml_set_nodup (repmgr_obj, "id",          xmlobj_get (repmgr_obj, NULL, "id"));
   if (*xml_attrval (repmgr_obj, "parent-key")) {
      xml_setf (wftk_obj, "dsrep",   "list:%s", xml_attrval (repmgr_obj, "parent-list"));
      xml_set  (wftk_obj, "process",            xml_attrval (repmgr_obj, "parent-key"));
      xml_set  (wftk_obj, "id",                 xml_attrval (repmgr_obj, "id"));

      wftk_task_retrieve (repository, wftk_obj);
   }

   /* 3. Construct return value. */
   ret = xml_create ("task");

   xml_set_nodup (ret, "key",          xmlobj_get (repmgr_obj, NULL, "key"));
   xml_set_nodup (ret, "list",         xmlobj_get (repmgr_obj, NULL, "list"));
   xml_set_nodup (ret, "obj",          xmlobj_get (repmgr_obj, NULL, "obj"));
   xml_set_nodup (ret, "state",        xmlobj_get (repmgr_obj, NULL, "state"));
   xml_set_nodup (ret, "label",        xmlobj_get (repmgr_obj, NULL, "label"));
   xml_set_nodup (ret, "role",         xmlobj_get (repmgr_obj, NULL, "role"));
   xml_set_nodup (ret, "user",         xmlobj_get (repmgr_obj, NULL, "user"));
   xml_set_nodup (ret, "sched_start",  xmlobj_get (repmgr_obj, NULL, "sched_start"));
   xml_set_nodup (ret, "sched_end",    xmlobj_get (repmgr_obj, NULL, "sched_end"));
   xml_set_nodup (ret, "cost",         xmlobj_get (repmgr_obj, NULL, "cost"));
   xml_set_nodup (ret, "only_after",   xmlobj_get (repmgr_obj, NULL, "only_after"));
   xml_set_nodup (ret, "priority",     xmlobj_get (repmgr_obj, NULL, "priority"));

   /* xmlobj_set_nodup (ret, NULL, "state", xmlobj_get (repmgr_obj, NULL, "state")); -- on second thought, this didn't make much sense. */

   mark = xml_firstelem (wftk_obj);
   while (mark) {
      if (xml_is (mark, "field")) xml_append_pretty (ret, xml_copy (mark));
      mark = xml_nextelem (mark);
   }

   xml_free (wftk_obj);

   /* Remove xmlobj fields for standard task attributes -- otherwise they'll appear in all forms. */
   xmlobj_unset (repmgr_obj, NULL, "key");
   xmlobj_unset (repmgr_obj, NULL, "list");
   xmlobj_unset (repmgr_obj, NULL, "obj");
   xmlobj_unset (repmgr_obj, NULL, "id");
   xmlobj_unset (repmgr_obj, NULL, "state");
   xmlobj_unset (repmgr_obj, NULL, "label");
   xmlobj_unset (repmgr_obj, NULL, "role");
   xmlobj_unset (repmgr_obj, NULL, "user");
   xmlobj_unset (repmgr_obj, NULL, "sched_start");
   xmlobj_unset (repmgr_obj, NULL, "sched_end");
   xmlobj_unset (repmgr_obj, NULL, "cost");
   xmlobj_unset (repmgr_obj, NULL, "only_after");
   xmlobj_unset (repmgr_obj, NULL, "priority");

   mark = xml_firstelem (repmgr_obj);
   while (mark) {
      if (xml_is (mark, "field")) xml_append_pretty (ret, xml_copy (mark));
      mark = xml_nextelem (mark);
   }

   free (_key);
   xml_free (repmgr_obj);

   return (ret);
}

And now a bunch of place holders for functions I haven't actually needed yet.

XML * _repos_format_object (XML * repository, XML * object, XML * layout)
{

}
WFTK_EXPORT char * repos_getvalue (XML * repository, const char * list, const char * key, const char * field)
{
   /* retrieve obj then call xmlobj_get */
   return ("");
}
WFTK_EXPORT void repos_setvalue (XML * repository, const char * list, const char * key, const char * field, const char * value)
{
   /* This will be a while.  It fronts for the adaptor, of course. */
}

Where else to put this? Why even include this? The idea is to reduce the DLL and linking dependencies for use of repmgr, but it's probably stupid. Ah well.

WFTK_EXPORT void repos_xml_free (XML * xml) {
   xml_free (xml);
}

October 11, 2003: Here's a logging function. At first, this is going to be really simple; it's just going to format things up and log to a text file in the repository directory. However, the log should logically be regarded as a list, so storage should be configurable. Or something. Under Unix, we should doubtlessly be doing something with syslog, at least for fatal errors.

The level and type parameters are there to allow some filtering:

level	meaning
0	fatal error
1	warning
2	notification
3	debug messages

typemeaning 1state change 2task change (creation/completion) 4action 8data
My initial implementation (as usual) is going to ignore most of the filtration possibilities, because I really just need a principled way to keep track of what's going on, but at some point we need to do something to figure all this kind of thing out. There are several different modes of logging, for instance:

Repository-central logging of important or dangerous events, especially during server startups and so forth
Object-specific logging of data events pertaining e.g. to worflow (the enactment history
Specific logging during portions of a process, possibly regarded as a report/event object to be stored in a list (like the publication log)
Ideally, this logging facility would be able to handle all of these logging situations and then some. But my eyes are bigger than my stomach as far as planning is concerned. Which you know, if you're reading this code.

WFTK_EXPORT void repos_log (XML * repository, int level, int type, XML * object, const char * subsystem, const char * message, ...) { va_list arglist; char * msg; char * l; struct tm * timeptr; time_t julian; FILE * log; int loglevel = 2; if (*xml_attrval (repository, "loglevel")) loglevel = xml_attrvalnum (repository, "loglevel"); if (level > loglevel) return; va_start (arglist, message); msg = xml_string_formatv (message, arglist); l = (level == 0) ? "fatal" : ((level == 1) ? "warning" : ((level == 2) ? "notification" : "debug")); va_end (arglist); time (&julian); timeptr = localtime (&julian); log = _repos_fopen (repository, "repository.log", "a"); /* TODO: make this a general list adaptor call. */ if (log) { fprintf (log, "[%04d-%02d-%02d %02d:%02d:%02d] %s %s %s\n", timeptr->tm_year + 1900, timeptr->tm_mon + 1, timeptr->tm_mday, timeptr->tm_hour, timeptr->tm_min, timeptr->tm_sec, l, subsystem, msg); fclose (log); } free (msg); }
[ Previous: Working with individual objects as reports ] [ Top: Repository manager ] [ Next: User authentication and group membership ]

This code and documentation are released under the terms of the GNU license. They are copyright (c) 2001-2005, Vivtek. All rights reserved except those explicitly granted under the terms of the GNU license.