Home > Data management / BI Tips > DB2 Advisor > Content management software: Who will leverage semi-structured and unstructured data?
Data Management Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DB2 ADVISOR

Content management software: Who will leverage semi-structured and unstructured data?


Wayne Kernochan
09.07.2006
Rating: -3.67- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Over the last couple of years, EMC has acquired Documentum and IBM has acquired FileNet. Documentum and FileNet are the two major content management software companies. More than that, they are relatively old content management software companies that have built on an ability to manage document and spreadsheet data at the desktop and LAN level to enter content management in the Internet boom years and survive the Internet bust. So by acquiring Documentum and FileNet, IBM and EMC are laying claim to a large body of existing users who are managing semi-structured (text, spreadsheets) and unstructured (Web graphics, video, audio, presentations) data.

IT is accustomed to viewing data management through the lens of the enterprise relational database managing structured relational data. In point of fact, by some estimates, the proportion of data in the typical enterprise that is semi-structured or unstructured is approaching 90% and rising. Much of that data is related in some way to key enterprise data. In other words, even if most vital enterprise data sits in enterprise business-critical applications, the related data -- customer pictures, X-rays, security tapes, email, news footage -- that would enhance the enterprise's ability to interact with customers and suppliers, brand the company more effectively and allow effective response to Sarbanes-Oxley legal discovery is sitting outside the enterprise-application data stores. Thus, the enterprise badly needs better ways of managing its semi-structured and unstructured data -- and relating it to mission-critical relational data.

By themselves, Documentum and FileNet would probably never have been able to provide the integration with relational data that users need. On the other hand, without credible content managers, neither IBM nor EMC could have provided the broad support for semi-structured and unstructured content that users also want.

But having carried out the acquisitions, what customer benefits are likely to emerge from either of the two in the short term? Or, to put it another way, what is the "end game," the database architecture that will allow effective content management and integration with relational data, and how close is each of the two to achieving it?

EMC and ILM

For EMC, Documentum seems to represent the keystone of an architecture that will introduce database-level and information-level metadata into a basically storage-oriented product set. The aim of introducing this metadata is to enable effective ILM (information lifecycle management) and intelligent information management (the next phase, requiring data classification in order to optimize storage of data not only by its age, but by other characteristics of each type of semi-structured and unstructured data).

This metadata, spanning relational/semi-structured/unstructured data, is of use not only in storage management but also to databases and business-level strategists seeking to achieve the real-time enterprise or to leverage corporate information for competitive advantage. In order to provide such a global metadata repository, EMC will need to inhale metadata not only from Documentum but also from other content managers, databases and applications (see Figure 1;note that Documentum offers some adapters that allow input of non-Documentum content into a Documentum content store). In other words, to meet the oncoming need, EMC should build its repository upwards in the enterprise-architecture software stack.

Figure 1: A Possible Global Metadata Repository Architecture

Source: Infostructure Associates, August 2006

IBM and information on demand

With the acquisition of FileNet, IBM now can add the mass of content that FileNet controls to its arsenal. In order to integrate this content with other content, IBM has the "enterprise content integration" capabilities of WebSphere Information Integrator that IBM gained with its acquisition of Venetica. In order to amass metadata from applications, databases and content managers and place the metadata in a global metadata repository, IBM has Information Integrator itself.

At the same time, IBM can now store semi-structured and unstructured data in DB2 itself. DB2 9 ("Viper") enables coexistence of relational and XML data (IBM calls its approach "pureXML"):

  • XQuery transactions can be performed on XML and relational data.
  • SQL transactions can be performed on XML and relational data.

What differentiates IBM from, say, Oracle and Microsoft in this type of "hybrid" database is that IBM makes a great effort to associate data-type-specific metadata and indexes with each set of XML data, rather than treating it as an undifferentiated mass of objects. This, in turn, allows IBM to optimize transactional performance for each type of XML data.

Since XML allows users to encapsulate semi-structured and unstructured data as XML messages, support for XML and XQuery means generally accepted common formats for storing and performing transactions on semi-structured and unstructured data. Thus, a "hybrid" database can act like a content manager across all types of data, or an enterprise database across all types of data.

Pair DB2 with WebSphere Information Integrator, and we have what Infostructure Associates calls a "virtual operational store" (VOS): an entity that looks like a single database, stores or caches key operational data with updates replicated to other data stores, and has a global view of data not included in the VOS's store. In other words, such a VOS can mimic to some extent an enterprise-wide database containing all of the enterprise's data, unstructured, semi-structured and structured. Such a VOS can make "information on demand" for a real-time enterprise more of a reality by ensuring that "information" also means semi/unstructured data.

Where IBM has yet to complete such a vision is in two areas: first, FileNet and DB2 need to be integrated so that semi/unstructured data can be stored in whichever data store makes sense; and second, the metadata repository for which Information Integrator provides the key must be extended to storage management and collection of storage-level metadata.

Conclusions

In the short run, the acquisition of FileNet gives IBM an answer to customers concerned that IBM may not support their increasing semi/unstructured data storage needs, compared to EMC. In the long run, however, customers will need a broader solution that realizes that (a) content is much more prevalent than relational data, and (b) much of the value of content is in its relationships with business-critical relational data.

To handle these long-run customer needs, computer companies ranging from IBM and EMC to Oracle, Microsoft, HP and Sun will need to develop database architectures that support and integrate content and relational data in the same data store, as well as integration between separate content managers and enterprise databases. Because of FileNet, DB2 and WebSphere Information Integrator, IBM now owns a head start in delivering this kind of functionality and integration. At the same time, IT buyers should keep in mind that every computer company has a ways to go to achieve full content integration. For one thing, "intelligent information management" is still on the drawing boards.

In the meantime, the best start that users can make towards this goal is to begin to develop a global metadata repository that can include content-type metadata. For example, master data management efforts should include efforts to classify semi/unstructured data related to customer data (e.g., pictures, legal documents or email) as part of master data and as content-type data. As storage metadata is also created, that can be folded into the repository. This will provide a solid base for managing not only content but also content/structured-data relationships across the enterprise, as vendor offerings arrive. In other words, there is no need to wait; some of the long-run benefits of semi/unstructured data can be realized now.

About the author

Wayne Kernochan is president of Infostructure Associates, an affiliate of Valley View Ventures that aims to provide thought leadership and sound advice to both vendors and users of information technology. This document is the result of Infostructure Associates sponsored research. Infostructure Associates believes that its findings are objective and represent the best analysis available at the time of publication.

Rate this Tip
To rate tips, you must be a member of SearchDataManagement.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
DB2 Advisor
DB2 security: The starting point
IBM iSeries (AS/400) DB2 UDB: Top 10 expert questions
Information Server products touted by IBM at conference
EDA software: Event-driven architecture and DB2
XQuery and XML data: DB2 helps manage the era of unstructured data
"Low IT" databases: DB2 for the low IT user
Enterprise information integration (EII) delivers information on demand
IBM DB2 and Master Data Management
Mainframe Micro-quiz: DB2 V8
IBM fellow on DB2 V8 for zSeries

IBM DB2 management
How to select an MPP database: DB2 vs. Teradata
What are the top database management systems (DBMS)?
Are there benefits to using both Teradata and a DB2 database?
Tips for evaluating top database management systems and choosing a small DBMS
Exec explains IBM's Information On Demand (IOD) initiative
IBM DB2 9 Fundamentals certification (Exam 730): Sample questions about tables, Part 7
IBM DB2 9 Fundamentals certification (Exam 730): Sample questions about tables, Part 6
IBM DB2 9 Fundamentals certification (Exam 730): Sample questions about tables, Part 5
DB2 basics
IBM DB2 basics

Financial reporting and compliance data management
Business intelligence in financial services: Special report
Business Objects customer frustrated with SAP licensing, technical hiccup
Microsoft gives PerformancePoint Server's financial planning component new life
New data analysis apps part of IBM's industry-specific BI vision
What are the best analytical tools for business intelligence for finance?
Disjointed eDiscovery practices exposing companies to legal risk, rising costs
Business intelligence software helps states track federal stimulus spending
An overview of Sarbanes-Oxley compliance software
Automating Sarbanes-Oxley compliance: Understanding SOX software
Sarbanes-Oxley compliance quiz: Are you SOX savvy?

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
compliance  (SearchDataManagement.com)
consumer privacy  (SearchDataManagement.com)
Patriot Act  (SearchDataManagement.com)
privacy  (SearchDataManagement.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Data Management Integration - EDI, EAI, ETL, MDM, CDI, PIM
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts