Developing quality metadata and designing workflow

Learn how to design an efficient workflow and read about the role of quality metadata in workflow management, in this chapter excerpt.

Designing workflow and developing quality metadata

The following is an excerpt from Developing Quality Metadata: Building Innovative Tools and Workflow Solutions, written by Cliff Wootton. It has been reprinted here with permission from Elsevier; copyright 2007. Read the chapter below to learn how to design an efficient workflow, and about the role of quality metadata in workflow management. Or, download a free .pdf of "Developing quality metadata and designing workflow" to read later.

What is a workflow?

A workflow describes the movement of documents or tasks through a sequence of processing steps during which work is performed on the content. It is the operational aspect of a work procedure and controls the way that tasks are structured. It also determines where and who performs them, and in what order. The synchronization of tasks might be controlled with auxiliary information flowing in to support the activity. The system tracks the work as it is done, and a journal of who did what, as well as when and how they did it, is maintained.

The information recorded in the journal may be analyzed later on to calculate throughput as a measurable value. This has evolved from the "time and motion" studies that measured the performance of the people staffing and operating manufacturing plants since the industrial revolution.

Workflow might also describe a production process in an industrial context; it is not necessarily always an audiovisual- or information-based process. It might describe a sequence of algorithms that are applied in a scientific or numerical analysis system. This is particularly useful when studying genetics, due to the sheer bulk of information needing to be processed. Business logic can be embodied in a workflow to control and manage a commercial enterprise. In other words, an organization may have multiple workflows all running concurrently in different parts of the business.

About the task

Metadata is often used to control the workflow. Data and metadata are covered in more detail in Chapter 2.

Workflow is the process that surrounds a content management system. It is worth taking a few moments to examine each of those three words—content, management, and system—separately.

The content is up to you. It is whatever you want to create, edit, collate, and deploy. It might be movies, text, pictures, or graphics. It could be physical assets in the real world represented by objects in the system.

The "M word" (management) means you are controlling and operating on something. You probably want to:

  • Make a new one.
  • Edit an old one.
  • Delete one you don't want anymore.
  • Give a copy of one to someone else.

Distribution may include some conditional access, possibly with some residual control remaining in your hands. Something that is systematic is planned, regulated and controlled in a predictable way. Systematizing the content management means, you have planned how it is going to work. It is not a process that works correctly on an occasional and random basis. Some systems are like that. They don't merit being called systems because they are probably informal processes with no well thought-out design.

It is up to you to choose the content, decide what you need to do to manage it, and design a set of tools that allow your users to operate on that content systematically.

High-quality metadata makes it all work consistently and reliably.

Driving forces

Developing quality metadata and designing workflow

Designing large content systems is no small feat. It is always a challenge to construct something that is streamlined, efficient, cost-effective, and easy to manage. These are often multimedia systems that manage textual, spatial, temporal, and abstract data. In major deployments, several hundred users will simultaneously access content in a repository in which several million items are managed. At the other end of the scale, the system may be used by only a handful of users and manage a few thousand assets. Aside from their size, the two extremes are amazingly similar in many respects, and the same problems crop up in both.

The meat of the problem

In the past, the entire system would be built from scratch. Now, significant parts of the system are available "off the shelf" as turnkey components, and the engineering task concentrates on integrating the large modular system blocks together. That is sometimes difficult, because different suppliers may have used different—and incompatible—data formats or interfaces.

This book focuses some attention in this "glue-ware" area. Ultimately, the quality of the service offered by your web site, TV channel, video-on-demand, IPTV, PVR, or DVD sell-through business depends on how well integrated your systems are. The same applies to other business areas whether they are in banking and finance or energy and exploration.

More information
For more information or to purchase Developing Quality Metadata: Building Innovative Tools and Workflow Solutions by Cliff Wootton, visit the Elsevier website.

Problems exist in the spaces between systems. It may require ingenuity and careful software engineering to successfully connect two incompatible systems. Your suppliers may help, but they won't want to be responsible. To them, this is a support overhead.

It is much easier to implement this glue-ware correctly in the first place than to throw something together in a hurry and spend a long time debugging it. You need to understand both sides of the connection and exactly what transformations are necessary. Then implement the glue carefully.

Simply disregarding something like a time-zone offset and not storing it at all during the ingest process may not cause a problem most of the time. After all, your video journalists may all be shooting within the same time zone as the play-out system is broadcasting to. You might assume that a time-zone value wastes space in your database. Wrong!

Later on, a foreign data item arrives, and the time signature is incorrect because the time zone adjustment cannot be applied. You could be faced with a difficult and complex upgrade that needs to be applied urgently to your system. That is the wrong time to be thinking about changing a fundamental part of your data and metadata storage model because the knock on effects could be serious. A fix like this might have to be applied to an ingest process and have consequences throughout the entire system.

Take it in small steps

Breaking down the problems into smaller tasks is important. Something simple like counting how many pages are in a PDF file can become quite complex due to the different ways that a PDF file can be constructed.

The obvious approach is to open the PDF and parse it with your own application. This is not necessarily the best solution. The same is true when dealing with other complicated formats like video, audio, and image data. If the information you need is trivial and likely to be in the file header, then it will be quicker to access directly by opening the container as a raw file and reading in the first few bytes. If you want to analyze the image content in more detail, you could gain additional leverage by using other tools and controlling them externally.

That approach operates indirectly on the data by remotely controlling another application.

The solutions to many recurring problems can be explained more effectively with tutorials. The tutorials often solve these problems by using hybrid techniques. Sometimes they enclose applications or tools in script wrappers; I prefer to let the operating system and scripting tools take the strain.

I spent a long time trying to make Adobe Illustrator export an image file as text. The scripts I wrote at first never worked completely satisfactorily, because there would be a point at which I would have to step in and click a mouse button to continue. This immediately prevents the solution from being deployed in automated workflow systems.

Sometimes, a solution that appears unusable or unnecessarily labored suddenly gives way to a much simpler answer. In this case, after I stepped away from the problem and approached it in a different way, I found a workable solution that consisted of barely three lines of AppleScript and one UNIX pipe with several simple commands.

Don't always select the apparently obvious first solution. It is tempting to go straight for an XML interchange format. Yes, XML is powerful and ubiquitous, but what did we use before XML came along? Sometimes those old-fashioned techniques are faster, more compact, and easier to publish or manipulate. For some legacy applications, they might be the only way you can import or export data in a way that those applications can understand.

I like XML solutions because you can do a lot with the output. Since XML is well covered in other books, I am going to focus here on alternatives to XML, other technologies or approaches you may not have thought of using. The "road less traveled," if you will.

Creative artists vs. Geeks

As a rule, creative people come from an arts background and they often think of a computer as a tool without any degree of built-in intelligence. To many of them, it is just a different kind of canvas on which to draw or paint and appears to be a tool that depends almost exclusively on manual operation. As such, they often fail to exploit the intelligence that the computer offers them.

I come from a technical background and developed my creative skills through some of the work I did earlier in my career as an illustrator. I often think of technical, automated ways to accomplish repetitive creative tasks. That provides me with useful leverage to improve my creative productivity. It also speeds up my throughput by an order of magnitude at least.

It is important to get the maximum value from your computing resources, whether they are small and limited to a single machine or of a grander scale, perhaps serving several hundred users. In the context of creating, storing, and managing metadata in any kind of workflow, let's push the envelope a little in order to get the maximum leverage from the systems at our disposal.


Dig Deeper on Data management tutorials