Ten years ago, the idea of a computing utility that would dispense information on demand first began to be discussed....
Today, it is beginning to be implemented in the real world. The key value-add from such a utility, compared to today's fragmented applications and data stores, is not only IT cost-effectiveness but also, and more importantly, full access by the business to the right bit of information or the right change in information ("event") immediately. This kind of "real-time" information access, in turn, translates not only to better decision-making, but also to comparative advantage by identifying key proprietary data that allows the enterprise to create and maintain better relationships with customers and suppliers — and because that data is proprietary, competitors cannot imitate it.
Because information on demand depends on the ability of the enterprise architecture to find and combine all key data in real time, no matter the format or location, the key foundations for an information on demand architecture are a high-performance database and a high-performance Enterprise Information Integration (EII) solution. The Enterprise Information Integration solution finds and carries out transactions on all enterprise data stores; the database provides the transactional engine for caching the data or for managing a repository of enterprise metadata.
IBM is in the forefront of providing an information on demand architecture that customers can use (and are using) to implement the information utility. DB2 and WebSphere Information Integrator can serve as strong foundations for this architecture.
What Is EII?
EII is software that combines heterogeneous data sources at a transactional level (in real time, via "query federation") in order to support applications that present or analyze the data in new ways. In other words, EII provides a "database veneer" or service that allows administrators, developers, and end-users to treat a broad array of data sources as if they were one large database or data service (see Figure 1).
Today's EII has three key features:
- It is software infrastructure, not an application. EII supports creation of applications such as enterprise portals that display and analyze data combined across data sources. EII can therefore play a key role in any enterprise-wide e-architecture.
- It bridges the second and third tiers of a typical enterprise architecture via "many-to-one-to-many" connectivity. That is, EII is a single software "node" that gives a wide range of Web applications running on the second tier of an e-business architecture access to multiple back-end and legacy databases and file systems running on the third tier ? without replicating data.
- It provides an enterprise-scale, integrated approach. Before EII, most applications created their own links to back-end data sources, without coordination. EII provides a common infrastructure on which all links can be built. This common infrastructure enables more rapid development, more cost-effective centralized administration of data from multiple data sources, and more flexible presentation and analysis of more data sources.
EII solutions serve four constituents: the data owner, the developer, the administrator, and the end-user. In order to serve the developer, EII provides a "framework" that allows the developer to treat multiple, disparate databases, data management systems, and files as one huge organization-wide database.
To aid the administrator and data owner, EII creates a "metadata repository" of data and object types (often XML-based) across multiple data sources, either via semi-automated import from databases' data dictionaries or by user-driven generation of metadata about each data source. Typically, the EII solution will provide tools that allow the administrator to monitor, maintain, and implement security for the metadata. Data owners determine access privileges to the source data and determine when and how queries are generated against the source system.
To aid the end user, EII supports basic SQL operations, including querying for decision support, and can support stored procedures, or services that look like stored procedures. EII support for these operations enables Web-analytics queries across databases, and abets enterprise portal and operational application use of combined information from employees, customers, and the supply chain. Some EII suppliers support XQuery ? a powerful XML-based cross-data-type querying language.
Connectors and gateways are not EII, because they do not support "query federation." Moreover, EAI (Enterprise Application Integration) is not EII. EAI aims to integrate different applications (or processes invoking these applications), typically from different suppliers. Thus, EAI does well at translating all the different (typically relational) data formats of major enterprise applications such as SAP, Peoplesoft, and Siebel. The primary purpose of EII is to combine data from different data sources and deliver the results to an end user in a reasonable time. Thus, EII does well at real-time querying performance and at handling non-relational data.
Note that IBM's WebSphere Information Integrator now dominates the EII market in visibility and revenues ? and deservedly so. They have taken the lead among EII vendors in popularizing the EII concept and supporting enterprises that use EII, and have aggressively moved to provide a full-fledged and exceptionally functional EII suite. WebSphere Information Integrator has proven its value in the field, not only in its own right but also as an enhancement and complement to IBM's business intelligence, business integration, master data management, and information on demand solutions.
The Role of EII in Delivering Information on Demand: The Information Grid
EII can create a data services grid on top of key organizational applications and information. Moreover, just as an electrical utility uses a nationwide or regional power grid to deliver electricity on demand in real time, IT can use EII to deliver information, in real time where possible, across all of its data sources.
I define a grid as a distributed, heterogeneous computing environment in which computer resources such as processors, storage, data, and applications on systems that may be owned by disparate groups of users and that may have different administrative domains are made available to applications that require resources on an on-demand basis. Note that individual resources do not have to be dedicated to the grid. High-level policies describing who may use the resources, how they are used, and when they may be used determine resource availability.
Many grid implementations start with an "information" or "data services" grid rather than one aimed at compute-intensive applications. Thus, IBM users' grid implementations typically consider "information virtualization" as the second step in their implementation process, with some users stopping there (rather than adding other grid computing features such as scheduling). This data services grid aims strictly to provide data-type transactions (updates, reads, queries) across a "grid" of distributed, heterogeneous data sources. In such a data services grid, users can access existing (or created) copies of data items by referencing a (optionally enterprise-wide) metadata repository. An overall transaction manager partitions queries among copies and data sources, load balances to increase the usage of particular machines and the capacity of the entire system, and "fails over" when one of the servers fails in order to increase robustness.
EII is an excellent choice to form the keystone of a data services grid. EII allows real-world data-services grid implementers to invoke all of an enterprise's data sources (as well as extra-enterprise ones such as search-engine results) in carrying out a transaction.
The data services grid, in turn, can form the basis of an "information utility," in which, like an electrical utility, a combination of software and hardware delivers information rapidly when requested, with wide flexibility to handle not only surges in transaction rates but also long-term expansion in the amount of data stored.
The Role of DB2 in Supporting EII
DB2 now fully supports both relational and non-relational data -- and is highly scalable. Thus, EII implementers can use DB2 to cache data from far-flung data stores for more rapid processing, and to carry out transactions combining multiple data types within the cache.
Moreover, as a side effect, implementing information on demand using EII creates an enterprise-wide metadata repository. As noted in a previous article, DB2 can be a highly effective way to implement such a metadata repository. Its scalability, robustness, and long experience with data dictionaries (per-database metadata repositories) make it a logical choice. Also, it is well integrated with Information Integrator, so that it can use Information Integrator's ability to semi-automatically go out and search for master data no matter what the data type, initially populate the metadata repository, and update the repository as new customer record types arrive at local sites. DB2 has long demonstrated scalability in TPC-D benchmarks (aimed at measuring database querying prowess). Its ability to store data in XML format and perform operations on that data, combined with this querying performance, suggests good performance in queries on complex metadata.
Today's information utilities may have the fanciest business-intelligence, OLAP (online analytical processing), CRM (customer resource management), SCM, and other applications in the world, but without a scalable infrastructure that accesses all key data in the enterprise, they take advantage of only a fraction of the organization's leverageable data. This scalable, flexible information-on-demand infrastructure, in turn, demands a powerful EII tool and complementary database.
Because IBM's WebSphere Information Integrator is such a strong EII tool, DB2 as its natural complement can play a key role in the success of an information-on-demand initiative. Therefore, information-on-demand implementers should consider setting up a data services grid that uses Information Integrator as a key component, and DB2 as its cache and metadata-repository database.
About Infostructure Associates
Infostructure Associates is an affiliate of Valley View Ventures that aims to provide thought leadership and sound advice to both vendors and users of information technology. This document is the result of Infostructure Associates sponsored research. Infostructure Associates believes that its findings are objective and represent the best analysis available at the time of publication. This document is subject to copyright. No part of this publication may be reproduced by any method whatsoever without the prior written consent of Infostructure Associates. All trademarks are the property of their respective owners. While every care has been taken during the preparation of this document to ensure accurate information, the publishers cannot accept responsibility for any errors or omissions.