- (Attaching documents to records
- (Treating a normal data field as an attachment
- (Dead-simple document management: filesystem plus metadata
- (Multiple attachments and references: the folder model
- (Version control
- (Checkin and checkout
- (Retention management
- (Recipe: mailing documents into an archive
- (Recipe: tracking a CVS system
(Attaching documents to records
An attachment is an ``out-of-band'' form of data. A record may have any number of attachments, which are named as though they were fields. However, the data in an attachment is assumed to be large, and it may also be assumed that it is a binary file (you can mark an attachment as text if you want to be able to handle it as text).
Attachments are generally stored separately from the contents of the data fields. (This isn't always the case; you can tell a list you want to treat a normal data field as an attachment, and it will happily do so.) A convenient way to do this is to reserve a subdirectory for storage of attachments in files. In this case, the filename is generally just a number used to index into the files.
(Treating a normal data field as an attachment
(Dead-simple document management: filesystem plus metadata
Document management is the term used to describe a system to control access to and updates of these large binary items. There are, as always, several ways of organizing this type of access. The simplest way to use document management is simply to set up a list which tracks the contents of a document repository, with one attachment per record, and to allow new versions to replace old ones. This is effectively a ``filesystem plus metadata'' model, and there are plenty of places where it's all you'll need.
(Multiple attachments and references: the folder model
A different approach similar to this is simply to allow arbitrary and possibly multiple attachments to a record being kept for some other purpose. An example of this might be a list of projects, each of which is treated as a folder containing documents. This is quick and easy to comprehend, but it can quickly run aground if you want to attach a document to more than one record, so a simple extension to this model is to store all your attachments in a ``filesystem plus metadata'' and then to store references to those documents in your main list. We can think of this as a ``folder'' model, where a given document may appear in more than one folder.
(Version control
The next step up in complexity, then, is version control. At its simplest, version control simply tracks updates to a given document, but doesn't impose any structure on the process used to update it. If this reminds you of CVS or svn, that's because this is exactly how they work. Not coincidentally, the wftk can be used to examine and manipulate CVS and svn archives.
In this model, a reference may be to the latest version of a given document, or to a specific version, depending on your application. The wftk itself uses a versioned repository model to store its procedure definitions, allowing a specific running process to refer to the version of the procdef used to start it, while new processes of the same type refer to updated versions.
(Checkin and checkout
Once we've introduced the notion of version control, a checkout/checkin system can be used to impose a little structure on the process of updating documents. Checking a document out changes its status so that others can't check it out, while checking it in creates a new version. This is a fairly standard process, but it's by no means mandatory -- we'll cover this in later chapters, but since the action of obtaining edit access to a document is an action, it's subject to permission control and workflow, so any update process can easily be modeled.
(Retention management
Besides controlling access and updates, a document management system must also manage the archive. This is called retention management, because it can be used to set up rules for how long particular documents or versions must be retained, based on whatever criteria are appropriate. A full-blown retention management system includes a periodic test of rules to determine items which can be moved from one storage location to another, or even discarded. Since it relies on a lot of functionality that will be introduced in later chapters, a lot of this should simply be taken on faith.
(Recipe: mailing documents into an archive
Here's a practical recipe for document management: a full implementation of a simple archive defining an incoming email input path. After setting this system up, you have an email address to which any document can be mailed as an attachment. The system stores the incoming document along with metadata extracted from the email (sender, subject, date, and so on). If multiple documents are sent, a folder is created with references to the individual documents. Either way, an event for the entire email is generated, and a notification is sent to an administrator. You could also trigger any desired workflow based on the incoming mail, but that will be a topic for another day.
(Recipe: tracking a CVS system
Here's a system that models a local CVS system as a wftk repository. When documents are checked in, some analysis is done and an approval workflow is initiated. If the change is not approved, it is backed out of the repository and saved for later as a separate proposal. If the change is approved, and a change request is referenced in one of the checkin comments, action is taken to close the change request.
Or something. I'm not sure how good an example this is.
