This article originally appeared on the BeyeNETWORK.
A lot of hype and publicity has recently been focused on customer data integration (CDI). Customer information is critical for an organization in order to provide “quality” products and services to their customers.
The question I want to explore is, “What is meant by the term data integration?” The term “business rules” used to mean policies and constraints on business processes that translated into edit and validation rules of the data being created or updated.
However, the term “business rules” seems to be used more in the context of ETL (extract, transform and load) as rules for “transforming” data from the form in which it exists in a source data store to a different or disparately defined form in order to propagate the data into a target data store. Most all uses of the term “data integration” now seem to address the concepts and processes of transforming data from one form to another for propagation to another data store.
Some Existing Definitions of Data Integration
- The data warehousing team at Georgetown University defines data integration as, “The movement of data between two co-existing systems. The interfacing of this data may occur once every hour, once a day, etc.” What they are really defining however, is “data interfacing” – not integration.
- “Customer data integration (CDI) is the process of consolidating and managing customer information from all available sources, including contact details, customer valuation data, and information gathered through interactions such as direct marketing. Properly conducted, CDI ensures that all relevant departments in the company have constant access to the most current and complete view of customer information available.” – Source: SearchDataManagement.com
- “Data integration is the ability to produce and/or consume the same data. The ‘save as’ functionality within Microsoft Office is the primary mechanism used to achieve data integration. This facility will for instance, allow a user [sic] to save an Excel spreadsheet in a form that can be opened by Word. The Word user [sic] is then able to open and use the content of the spreadsheet within their word processing document. The content is either text or an embedded spreadsheet depending upon the kind of import that was done. Copying data from one system to another as described here is one form of data integration and is referred to as ‘value copy’ data integration.” Source: Vincent Mastro, DM Direct Newsletter, March 5, 2004.
- In an article published on DMReview.com on September 2, 2004, The Integration Consortium wrote, “EII [Enterprise Information Integration] is the integration of data from multiple systems into a unified, consistent and accurate representation geared toward the viewing and manipulation of the data. Data is aggregated, restructured and relabeled (if necessary) and presented to the user [sic] …. Data integration is the extraction, transformation and loading (ETL) of data from disparate systems into a single data store for the purposes of manipulation and evaluation (reporting).”
But are any of these definitions really “data integration”? If one has to restructure, relabel or extract and transform data, it cannot possibly be considered “integrated.” It can only be considered redundant proliferation of data that exists in two or more different forms.
Why These are NOT Data Integration
But Mastro goes on to say his earlier cited definition is not data integration: “Data values (i.e., instances) are copied into the component systems for local use. As time passes, the values change independently of the original. There is no means for ensuring value consistency between the component systems. In addition, there is no means for ensuring that the original definition of the data is the same across the component systems.” One can never consider as integration the fact that you have two copies of data, equivalent at one point in time, but can have uncontrolled changes, such that knowledge workers looking at what should be the same record see different data values.
What Data Integration Should Mean
I propose the following definitions of “data integration” and a definition of what has been wrongly labeled data integration:
- Data integration: The process of managing information in a way that provides a single definition of a given fact of information (data element) and a single record of origin/record of reference data store housing the data, accessible by all applications and knowledge workers requiring access to it. This process eliminates or minimizes uncontrolled, redundant data stores that provide a single view of information about a specific object or event of interest without having to reconcile disparately defined data definitions or data values.
- Data interfaceation: The process of designing interfaces of any kind for the purpose of moving data from a source data store to a target data store, required because of disparately defined data stores or because of software packages with third-party defined data.
Every time an organization builds an interface to move and transform data from an operational database to another operational database, they are adding cost – not value – to the enterprise. The reason? Information, as the only non-consumable resource of the enterprise, is never used up or destroyed in the processes that retrieve and apply it.
CEOs should be demanding a business case for any project seeking to build a redundant database. And the CIOs should never have to ask for a business case for projects that call for designing enterprise-strength information models and subject-oriented databases that contain all information about that subject or resource required by all knowledge workers.
What do you think? Let me hear at Larry.English@infoimpact.com.