How change data capture (CDC) and data federation differ

Learn the difference between change data capture (CDC) and data federation. Find out how companies can use both data integration technologies to improve data warehouse systems.

How do data federation and change data capture (CDC) differ, and are businesses able to utilize both of those data integration technologies? Or is it a one-or-the-other type of deal?

The premise behind data federation is to manage large, disparate sets of information as a series of related but individual data stores. The federated approach came about because of the traditional belief that building four small systems is easier and less expensive than creating one massive system. For example, there are many companies that have multiple data marts, each with their own specific subject-area content. To support users who need to analyze business issues that require data from multiple data marts, companies can employ one of several different methods to integrate the data into a single view. Data federation itself isn’t as much a data integration technology as it is an architectural approach to storing disparate information across different systems.

Change data capture is an extract method for ETL processing (i.e., extract, transform and load routines). There are two ways to extract data from a source system: as a full image of the data or one that only includes changed records. Change data capture (as the name suggests) focuses on extracting and loading only those data records that have changed on the source system since the previous ETL process.

As data warehouse and data mart environments mature, it’s fairly common for ETL activities to migrate to a change data capture approach to reduce the volume of data that must be extracted and processed as part of the ETL routines. CDC usage is independent of whether or not a data warehouse environment is centralized or federated. In fact, it’s fairly common for federated environments to include change data capture techniques and tools.

Dig Deeper on Data virtualization and data federation