When a major hospital network in South Korea wanted to electronically consolidate all patient data -- much of it handwritten -- it faced some major hurdles.
The Catholic Medical Center, an association of eight large hospitals and 23 other medical facilities across the Republic of Korea, had data in relational databases and lots of handwritten notes which it wanted to combine into one database to improve patient care. Overcoming the first challenge -- making handwritten notes into electronic data -- turned out to be fairly straightforward, according to Bernie Spang, director of data servers with IBM. The hospital could scan the handwritten records and convert them to semi-structured XML files. But then it encountered the problem of how to store this semi-structured XML data alongside structured relational data in one database -- yet still maintain acceptable database performance.
Storing these two disparate data types in the same database has usually meant taking a significant performance hit, according to Philip Howard, research director with Towcester, U.K.-based Bloor Research International Ltd. But, as the hospital found, that's changing with the advent of a new style of databases.
Catholic Medical Center evaluated and implemented IBM's DB2 9 Viper data server, which helped it combine all of its XML and relational patient data in one database while maintaining database performance. DB2 9 is the first of its kind and is built on a unique architecture, Howard explained. IBM calls it a "hybrid data server."
"IBM has two different storage systems in the same database," he said, "whereas everybody else basically still has one storage system and then uses hierarchical indexing, XML data types and other [features] built on top of the database."
DB2 9 is not the only database that can store both XML data and relational data, Howard said, but it's the only one with two engines for relational and native XML storage. Other databases deal with the challenge by either taking the whole XML record and putting it into one big cell called a "character large object" (CLOB) or "shredding" the XML record, essentially deconstructing the data elements and putting them into relational form. The latter choice has drawbacks, including losing the data hierarchy or separating a signature from its related form or information. So some database makers, such as Oracle, support both of these types of XML data storage in one database, Howard said. Users can even store one XML file both ways to get the benefits of each approach. However, this can be quite time consuming, he said, and can create unwanted performance overhead.
Performance was one factor that drove Catholic Medical Center's choice. The Center also plans to use the data server as part of a service-oriented architecture to better integrate hospital systems and more efficiently store and access patient data.
"Now that we can store XML data directly on our IBM DB2 9 data server, we have been able to streamline access to medical records and improve care to our patients," said Lee Hungu, team manager, development service team, Catholic Medical Center. "At the same time, we can reduce our storage costs with the deep compression feature."
Are hybrid data servers the way of the future?
No other vendors have a comparable database -- yet, Howard said. There are other databases that allow users to address SQL and XML in the same statement, and others that can index XML files, but none with the architecture of IBM DB2 9. Whether other database makers will come out with their own styles of hybrid data servers remains to be seen, he said, and market demand will play a big role.
The problem, and demand, is especially evident in developing countries, with large amounts of handwritten data, IBM's Spang said. And semi-structured XML information is rapidly growing, he said, driven in part by industry-specific XML standards such as those found in the medical, insurance and finance fields. In fact, a recent TDWI study found that 22% of corporate information is semi-structured (as in XML, etc.) and 47% is structured, and 31% of enterprise data is unstructured. It all seems to point to a growth in semi-structured data -- which has not gone unnoticed by database makers.
But Howard questions whether other leading database makers such as Microsoft and Oracle fully appreciate the challenges of storing the different data.
"I think that quite a lot of people don't understand what IBM has done," Howard said. "Even talking to some quite technical people at Microsoft, they insisted that because they had XML data types in the database, therefore that meant they had native support for XML. But software people don't always understand how the database is actually implemented on the disk. IBM has really taken it to a different level."