In this excerpt from Data Integration Blueprint and Modeling, readers will learn about common component-transformation data integration models and about data integration tools for developing data integration models. Readers will also get an answer to the question of whether organizations need both logical and physical data integration models. Plus, quiz yourself on your data integration modeling knowledge at the end of the excerpt.
* Structuring data integration models and data integration architecture
* Using logical data models for data integration modeling
* Data integration tools for developing data integration models
Common Component Transformation Data Integration Models
Most common transforms are those that conform data to an enterprise data model. Transformations needed for specific aggregations and calculations are moved to the subject area loads, or where they are needed, which is in the subject areas that the data is being transformed.
In terms of enterprise-level aggregations and calculations, there are usually very few; most transformations are subject-area-specific. An example of a common component-transformation data integration subject area model is depicted in Figure 3.14.
Please note that the aggregations for the demand deposit layer have been removed from the common component model and have been moved to the subject area load in line with the concept of moving functionality to where it is needed.
Physical Subject Area Load Data Integration Models
A subject area load data integration model logically groups “target tables” together based on subject area (grouping of targets) dependencies and serves as a simplification for source system processing (layer of indirection).
A subject area load data integration model performs the following functions:
• Loads data
• Refreshes snapshot loads
• Performs Change Data Capture
It is in the subject area load data integration models where primary and foreign keys will be generated, referential integrity is confirmed, and Change Data Capture is processed.
In addition to the simplicity of grouping data by subject area for understandability and maintenance, grouping data by subject area logically limits the amount of data carried per process because it is important to carry as little data as possible through these processes to minimize performance issues. An example of a physical data integration subject area model is shown in Figure 3.15.
Logical Versus Physical Data Integration Models
One question that always arises in these efforts is, “Is there a need to have one set of logical data integration models and another set of physical data integration models?”
The answer for data integration models is the same as for data models, “It depends.” It depends on the maturity of the data management organization that will create, manage, and own the models in terms of their management of metadata, and it depends on other data management artifacts (such as logical and physical data models).
Tools for Developing Data Integration Models
One of the first questions about data integration modeling is, “What do you build them in?” Although diagramming tools such as Microsoft Visio and even Microsoft PowerPoint can be used (as displayed throughout the book), we advocate the use of one of the commercial data integration packages to design and build data integration models.
Diagramming tools such as Visio require manual creation and maintenance to ensure that they are kept in sync with source code and Excel spreadsheets. The overhead of the maintenance often outweighs the benefit of the manually created models. By using a data integration package, existing data integration designs (e.g., an extract data integration model) can be reviewed for potential reuse in other data integration models, and when leveraged, the maintenance to the actual data integration job is performed when the model is updated. Also by using a data integration package such as Ab Initio, IBM Data Stage, or Informatica to create data integration models, an organization will further leverage the investment in technology it has.
Figure 3.16 provides examples of high-level logical data integration models built in Ab Initio, IBM Data Stage, and Informatica.
Experience in using data integration packages for data integration modeling has shown that data integration projects and Centers of Excellence have seen the benefits of increased extract, transform and load code standardization, and quality. Key benefits from leveraging a data integration package include the following:
• End-to-end communications – Using a data integration package facilitates faster transfer of requirements from a data integration designer to a data integration developer by using the same common data integration metadata. Moving from a logical design to a physical design using the same metadata in the same package speeds up the transfer process and cuts down on transfer issues and errors. For example, source-to-target data definitions and mapping rules do not have to be transferred between technologies, thereby reducing mapping errors. This same benefit has been found in data modeling tools that transition from logical data models to physical data models.
• Development of leveragable enterprise models – Capturing data integration requirements as logical and physical data integration models provides an organization an opportunity to combine these data integration models into enterprise data integration models, which further matures the Information Management environment and increases overall reuse. It also provides the ability to reuse source extracts, target data loads, and common transformations that are in the data integration software package’s metadata engine. These physical data integration jobs are stored in the same metadata engine and can be linked to each other. They can also be linked to other existing metadata objects such as logical data models and business functions.
• Capture of navigational metadata earlier in the process – By storing logical and physical data integration model metadata in a data integration software package, an organization is provided with the ability to perform a more thorough impact analysis of a single source or target job. The capture of source-to-target mapping metadata with transformation requirements earlier in the process also increases the probability of catching mapping errors in unit and systems testing. In addition, because metadata capture is automated, it is more likely to be captured and managed.
Industry-Based Data Integration Models
To reduce risk and expedite design efforts in data warehousing projects, prebuilt data models for data warehousing have been developed by IBM, Oracle, Microsoft, and Teradata.
As the concept of data integration modeling has matured, prebuilt data integration models are being developed in support of those industry data warehouse data models.
Prebuilt data integration models use the industry data warehouse models as the targets and known commercial source systems for extracts. Having industry-based source systems and targets, it is easy to develop data integration models with prebuilt source-to-target mappings. For example, in banking, there are common source systems, such as the following:
• Commercial and retail loan systems
• Demand deposit systems
• Enterprise resource systems such as SAP and Oracle
These known applications can be premapped to the industry-based data warehouse data models. Based on actual project experience, the use of industry-based data integration models can significantly cut the time and cost of a data integration project. An example of an industry-based data integration model is illustrated in Figure 3.17.
In the preceding example, the industry data integration model provides the following:
• Prebuilt extract processes from the customer, retail loan, and commercial loan systems
• Prebuilt data quality processes based on known data quality requirements in the target data model
• Prebuilt load processes based on the target data model subject areas
Starting with existing designs based on a known data integration architecture, source systems, and target data models, provides a framework for accelerating the development of a data integration application.
Data modeling is a graphical design technique for data. In data integration, data integration modeling is a technique for designing data integration processes using a graphical process modeling technique against the data integration reference architecture.
This chapter detailed the types of data integration models – conceptual, logical, and physical – and the approach for subdividing the models based on the process layers of the data integration reference architecture. This chapter also provided examples of each of the different logical and physical data integration model types.
It covered the transition from logical data integration models to physical data integration models, which might be better stated as how to move from the “whats” to the “hows.”
Finally, the chapter discussed how this maturing technique can be used to create prebuilt, industry-based data integration models.
The next chapter is a case study for a bank that is building a set of data integration processes and uses data integration modeling to design the planned data integration jobs.
Question 1. Data integration modeling is based on what other modeling paradigm?
Question 2. List and describe the types of logical data integration models.
Question 3. List and describe the types of physical data integration models.
Question 4. Using the target-based design technique, document where the logical data quality logic is moved to and why in the physical data integration model layers.
Question 5. Using the target-based design technique, document where the logical transformation logic is moved to and why in the physical data integration model layers.
More about this book and others like it...
- Intrigued by this chapter excerpt? Download a free PDF of this chapter:
- Read more excerpts and download more sample chapters from our Data Management Bookshelf.
- To purchase the book or similar titles, visit the IBM Press website.