Towards wftk 2.0

2007-10-24 workflow wftk

I never really officially released wftk 1.0, of course (the magnitude of the task simply grew and grew and I became less and less certain of my approach -- and then the recession happened.) But I've been thinking a lot of a more reasoned approach lately, and maybe it's time to reboot the wftk project and start more or less "from scratch".

I see the modules in this new approach more or less as the following:

Data management
This is the basic list-and-record aspect that the repository manager started out addressing. Now, of course, there is SQLite. So a principled workflow toolkit would start by using SQLite for local tables, and add "external tables" (for which the new SQLite has an API) defined in what SAP now calls the "system landscape". It's amazing, by the way, how much of my thinking over the past few years I see reflected in what SAP is doing lately in their NetWeaver stuff.
Document management
Document management, as I see it, consists of: (1) actual central storage and versioning of unstructured data; (2) storage of metadata about documents; (3) parsing and indexing of unstructured data to produce structured data elsewhere in the system. The document manager should be able to work well in either situations where it controls storage (and thus can initiate action whenever anything is changed) or when it merely indexes a storage which can be changed externally -- that latter might be, for instance, management of a Website's files in the file system. Or just your system files on a Windows machine. Periodically, the document manager could check in and see whether things had been changed, and if so, trigger arbitrary action.
"Action" management
A central script and code repository defines the actions that can be taken by a system. I consider this to include versioning and some kind of change management and documentation system, including literate programming and indexing of the code snippets. The build process should also be managed here, and should be capable, for instance, of taking algorithms written in C, compiling them into DLLs or .so dynamic load libraries, and calling them from Perl, say. Ultimately.
Actions, documents, and data would have a nested structure, by the way; there would be global actions, application actions (a given case or project could be an instance of an application), and project/instance actions, and the same applies to data and documents, perhaps. Originally I'd thought of doing the same for users or organizational units, but I really think that if you're defining a common language of actions and data, it should be organized into applications and, perhaps, subapplications or something. But not differ by user! (I might be wrong, of course.)
The above three modules together allow a data-flow-oriented processing system, but we're still missing:
Outgoing interfaces
This includes publishing of HTML pages, outgoing mail notifications, other notifications such as SMS or ... whatever. Logged, all of it. It includes report generation into the document management system or the file system, generation of PDFs, etc.
Incoming interfaces
Given the parsing power of the document management module, this is more an organizational module. The system should be able to receive email, parse it, and take action. Conversational interfaces are covered here as well, from SMTP- and IMAP-like state machines to chatbot NLP interfaces. And of course form submission from Websites also falls into this bucket.
Scheduling
Whether running on Unix with cron and at, or Windows with ... whatever the hell Windows offers, the system should have a single unified way of dealing with time in a list of scheduled tasks.
Users, groups, roles, and permissions
This module would be in charge of keeping track of who is performing a given action and whether they're allowed to do so. The original wftk already provided a really nice mechanism which would still be nice here: when judging permissions, any action can get the answers "yes, it's allowed", "no, it's not allowed," and "it's allowed subject to approval." That last invokes workflow for any arbitrary action and that would be a powerful abstraction for nearly any system. It's essentially transaction management on a much more abstract scale.
And finally, the icing on the cake,
Workflow
The two components which make workflow workflow are a task list (tasks are hierarchical in nature and so a task can have subtasks as a separate project) and a workflow process definition language. The new wftk should be able to work with any workflow formalism -- after all, the process definitions are considered scripts in the versioned script document repository. The existing wftk engine will almost certainly fit in here with little modification.
The primary benefit of workflow is that it allows dissociation over time. A running workflow process isn't active on the machine for the weeks or months it might require -- it's simply a construct in the database that gets resurrected as required. There are a boatload of applications in general programming, but nobody sees them as workflow because everybody "knows" workflow is a business application. The wftk was to have changed that, and I think the potential's still there.
There's also a case to be made for a module for
Knowledge management
This portion of my thinking is a little less organized. I'd kind of like to lump some kind of concept database in here, perhaps a semantic parser or something. Originally I'd thought that AI would go in here, but I actually think that Prolog might just be another action script language. This is definitely a blurry line in its native habitat, and crikey, he's not happy to see me here!
But the point of a blog is to write this stuff down as it occurs. So there you have it, this would sit on top of the workflow. Think of it as a way to build smart agents into your data/document/action/workflow management system.

And there you have it -- my plan to wrap up the thought and work of eight years. Oh, and this time I'm not bothering with licensing requirements. Like SQLite, wftk 2.0 will be in the public domain. I don't really care if I get credit or not for every little thing, because frankly, anybody who counts will figure it out. And have you noticed how everything these days uses SQLite? It's because -- well, primarily because it works, but also because you don't have to worry about legal repercussions of using the code.

That's where wftk document management should be, where wftk workflow should be. Simple, easy to use, and ubiquitous.