The growing importance of business intelligence and data analytics applications in driving business decision making has made data integration's vital role in the enterprise crystal clear. From gathering data, transforming it into useful information and delivering it to the business users or processes that need it, data integration routines provide the crucial link between a variety of source and target systems.
Several types of packaged software have emerged to meet the challenges of data integration. The current generation of data integration tools consists of full-fledged suites that support extract, transform and load (ETL) processes, application integration, cloud-based and real-time integration, data virtualization, data cleansing and data profiling.
How can you determine if your organization should invest in a data integration tool? To help justify the purchase of data integration software, let's explore how other organizations are using these platforms to meet their needs.
How companies are using data integration tools
We've all been hearing about the explosion in data volumes -- it's big data wherever you look. But not only is there more data to be integrated, there are more categories of data -- including a mix of unstructured and semi-structured data types in addition to traditional structured transaction data. This makes data integration more critical than ever.
The most prevalent use for data integration today is integrating data from multiple sources to enable BI and analytics applications, and it's typically where packaged data integration software is introduced in an enterprise. This use case can be further broken down into these three subcategories, which enable data to be:
- Integrated into a data warehouse or other analytical data store. This is the use case that started it all with ETL -- extract data from various sources, transform it and load it into an enterprise data warehouse (EDW). Data integration tasks account for the majority of your work when setting up an EDW and populating it with data. Traditionally, relational databases are most commonly used for an EDW, but nonrelational technology such as Hadoop clusters and columnar or NoSQL data stores are also increasingly being used to create what are known variously as hybrid, extended or logical data warehouse environments. That further adds to the data integration workload.
- Integrated into a BI data store dedicated to specific analytics uses. In this case, the primary integration tasks are to transform data sets from an EDW for use by specific business groups or areas of analysis and then load them into a special-purpose data store, such as a data mart, online analytical processing (OLAP) cube or columnar database. Data from other sources may also need to be added to enrich the information. Moving data from a relational EDW to a relational data mart is straightforward, but additional transformation work is required with an OLAP cube, columnar database or other nonrelational target system.
- Blended, prepared or wrangled for use in a BI platform. Although some BI tools let users query data directly, many, such as data discovery tools, work best with data models created using data integration tools, which are then used to load into an in-memory columnar model for analysis.
Organizations also need to be able to gather data from, and deliver it to, an increasingly diverse mix of systems, databases and applications running on premises and in both public and private clouds. Mobile and Internet of Things (IoT) applications add to the complexity, as does the use of external data sources to augment internal information. Typical use cases for data integration tools beyond BI include:
Migrating, consolidating or converting data from one or more applications to another application, database or device. The best practice for application consolidation or migration tasks has shifted from custom coding to using data integration tools. This change is due to the productivity gains these tools provide as well as business requirements such as data validation and documentation. The advantages of tools-based integration include built-in processes for complicated business or technical transformations, iterative testing and profiling data for historical data conversion, and support for managing parallel testing of new and old applications.
Acquiring and processing data for master data management (MDM). Depending on the state of the data, its sources and uses, general-purpose data integration tools may need to be augmented by special-purpose tools to cleanse or enrich the data. A common example is when customer-related data such as the names of people or businesses and their addresses need to be matched, cleansed and enriched -- that may call for leveraging things such as text-based transformations, site or address lists and business entity databases.
Synchronizing data between on-premises systems and cloud applications or IoT devices. Although hailed as a means of lowering technology costs, cloud applications typically must be integrated with existing systems running on premises. The same applies to the oncoming wave of data from IoT or smart devices, such as sensors on industrial equipment. All of this data from the cloud and IoT needs to be exchanged and synchronized between applications. Data integration tool capabilities have expanded to leverage various transport mechanisms and application program interfaces (APIs) to replace the custom coding that previously was the only method to perform this integration.
Exchanging data between business processes or applications at different organizations. Much of the initial wave of data exchanges between companies and their suppliers, business partners, customers and prospects were file-based transfers, but a data integration tool can automate such exchanges, increasing productivity and lowering costs.
Delivering and processing data for complex event processing and stream processing. Interoperability and data interaction demands between operational processes such as applications, event streams, message queues, Web services and sensors have steadily increased the need for real-time data integration. As data integration platforms have added real-time processing, more sophisticated workflow capabilities and support for a wide variety of APIs, they can be used instead of the custom coding that was previously required.
Virtually gathering and integrating data from disparate systems. Even when an enterprise has a data warehouse or MDM hub, there are many business scenarios when data virtualization should be used. First, sometimes real-time access to disparate systems is crucial, such as when an account manager or customer support representative is interacting with a customer regarding their account or outstanding orders. Second, integrating data from specific sources may occur infrequently or in an exploratory nature precluding the use and cost of integrating that data into a DW. Finally, there may be data sources that have not yet been considered to be integrated into a DW, but still need to be integrated for an operational process or analytical analysis.
With the ever-increasing amount of data from disparate systems that needs to be integrated to support business operational and analytical processes, it's imperative to determine your organization's data integration needs and use cases. Failing to identify data integration requirements will either result in your organization not getting the data it needs or getting it in a very costly, time-consuming and inefficient manner. And, as already mentioned, custom-coded data integration may actually create data silos that increase data inconsistency.
Learn how organizations are using stream processing to handle big data
How data wrangling can be used to address global issues
Moving beyond manual coding: How the data integration tools market is growing