The following is an excerpt from Documenting ETL Rules using CA Erwin Data Modeler, written by Sampath Kumar, Technology...
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Architect, Information as Service (InAs) at Infosys Technologies Ltd. It is reprinted here with permission. Read the excerpt below to learn about features and functionality in CA Erwin Data Modeler that can help users overcome common challenges with data warehouse development projects. Learn how to use CA Erwin Data Modeler to capture metadata information such as extract, transform, load (ETL) rules and learn how to generate reports and share this information effectively across your organization.
In any data warehouse development project some of the major challenges include
- Effective capture and maintenance of metadata information in data model such as data source, transformation rules and data synchronization, etc.
- Effective communication of captured metadata information by data modeler to other teams such as ETL
This document covers features in CA Erwin Data Modeler which can be leveraged for capturing the metadata information such as Extract Transform Load (ETL) rules. This document explains step by step of how to capture the ETL information using Erwin and also covers the generation of reports with the captured information to communicate effectively to other teams.
The data warehouse combines information from several Online Transactional Processing (OLTP) systems and archive data into a single decision support system. It can be either relational or non relational data source (both structured and unstructured data). In order to keep the data in synch with the operational system it's very essential to capture the data source for each column in the data warehouse and information of when and how the data is updated.
So in a nutshell the following information needs to be captured in any data warehouse environment.
- Source of data
- Transformation rules-The method in which the data is getting extracted, transformed and loaded
- Frequency: The frequency and timing of data warehouse updates.
In many organizations it used to be a separate document apart from data model but it becomes very hard to maintain document and data model in synch.
More on ETL and data integration
Find out if ETL is still the way to go for BI data integration
Read about Pfizer's switch from ETL to data virtualization tools
Learn about how big data apps demand new data integration strategies
Why it's important
Data modeling is the first step which converts the business rules into a data model and the data modeler is the one who understand the rules from business counterparts (both in a structured and unstructured way). As a result of this, the data modeler captures most of the business rules directly in the data model and some of them (such as data source, transformation rules and frequency above) needs to be passed on to other teams such as DBA and ETL .It's very essential to capture all data related business rules as a part of data modeling effort to avoid getting lost. CA Erwin Data Modeler provides effective way to capture this information and as a data modeler it should be captured as part of modeling efforts.
This document covers how the above challenges can be addressed using Data Transformation and Data Movement features available in CA Erwin Data Modeler. To explain better there will be a simple running example throughout this document which will navigate step by step.
The CA Erwin Data Modeler has come up with the following salient features to capture the metadata information effectively.
- Data warehouse Sources Dialog: to define sources of data for your data warehouse
- Columns Editor: to document the data warehouse source assignments and transform the information for each column in the dimensional model in the data source tab.
- Data Movement Rules Editor: to document the data warehouse maintenance processes required to regularly update each table in your dimensional model.
Let's explore these features in detail in the rest of the document using simple example of Customer_Dim.
Let's take a fictitious example of an entity Customer Dimension to explain the above features. Let's assume that it's sourced from multiple operational systems (relational DB), attributes having different transformation comments and the frequency of customer information getting updated is daily.
The following are the key attributes in the Customer dimension.
- Basic Information
o Customer name
o Customer Date of Birth
o Driving License
o Mailing Address
o Physical Address
o Email Address
Capturing Data Source
In order to proceed further lets create the empty data model using "Create Model" (File –>New) of the model type Logical/Physical and target database as Teradata
Let's create the Customer_Dim table and add the attributes
In order to make this model as dimensional model and to capture the data movement rules goto Model->Model Properties and select the check box for Dimensional and Data Movement
About the author
Sampath Kumar is a technology architect, Information as Service (InAs) at Infosys Technologies Ltd. Banking and Capital Market Business unit. He functions as lead information architect for complex, high volume, full cycle implementations worldwide -- utilizing the disciplines of data warehouse (DW), business intelligence (BI) and master data management (MDM). He has 11 years of rich and varied experience in IT industry and project management. Prior to joining Infosys, he was a senior data analyst at American Express Credit Cards for their Worldwide Risk Information Management group. He holds masters in software engineering and can be reached at firstname.lastname@example.org.