As the need to pull together the growing volumes of data that's generated and collected by organizations has increased, several types of data integration software have emerged to help IT teams simplify and manage the process. But with so many products to choose from, what's the best approach to selecting the right data integration tool for your enterprise? It isn't about picking the product with the most features, but rather the one that best matches your integration requirements and enterprise profile.
Before you start evaluating data integration platforms, ask some questions inside your organization to help guide the technology selection process. Your inquiries should cover the following topics:
Source systems. How many do you have? Do you have overlapping systems, such as multiple CRM or sales processing applications? Is there unstructured or semi-structured data in addition to conventional structured data? External data sources, in addition to internal ones? What are the data volumes and frequency of updates?
Integration use cases. Do you need to integrate data for analytics -- primarily through data warehousing? What about application consolidation? Does your organization need to acquire or process data for master data management (MDM)? What about synchronizing data between on-premises systems and cloud applications or Internet of Things devices? Or exchanging data between internal business processes or applications and ones at other organizations? Do you have to capture and deliver data for complex event processing or stream processing applications? Is there a need to integrate data from disparate systems virtually, without moving it to a central data store?
Enterprise size. What is your organization's annual revenue, how many total employees does it have and what's the IT budget for data integration?
Resources and skill sets. Do you have dedicated IT resources to perform data integration work? And what's the level of previous experience with data integration tools?
Once you have answers to these questions, it's time to take a look at the 10 leading data integration products to see which one best matches your needs and profile.
Data integration products for large enterprises
Large enterprises generally share the following characteristics:
- A diverse set of source systems that often overlap with high data volumes. Structured data sources are dominant, but unstructured data sources, such as social media, Web server logs and flat files, as well as semi-structured data sources, such as XML or message-oriented data, also need to be integrated.
- Multiple integration use cases.
- IT budgets sufficient to purchase any of the available data integration tools and supporting infrastructure as necessary. That doesn't mean these enterprises have an open checkbook, but they have fiscal means if justified.
- A dedicated IT group with existing data integration specialists or the budget to hire employees or consultants who have experience using the chosen data integration tool.
Large enterprises that fit this profile should consider Informatica PowerCenter and IBM InfoSphere Information Server for Data Integration, as these products address the entire spectrum of integration use cases. Both products also provide the scalability to handle the data complexity, volume and velocity of large enterprises, and can be used across multiple projects and with any size team. IBM and Informatica both offer MDM and data cleansing capabilities. IBM's product addresses information analytics and management needs, while Informatica concentrates on information integration. But these robust tools come at a price. In addition to being generally more expensive than their competitors, they require a more extensive set of skills and experience to use. Also, they typically require more extensive infrastructure and complex implementations than their competitors.
Many of IBM's and Informatica's competitors have significantly increased their capabilities and features over the years, providing more alternatives for large enterprises, especially those with less demanding integration needs than outlined above. Data integration tools from SAP, Oracle and SAS address a wide variety of data sources and integration use cases. Each of these companies also offers enterprise applications such as enterprise resource planning, CRM and analytics that are used extensively, especially in large enterprises, and they have leveraged their own data integration tools with those applications. If an enterprise has a significant investment in any of these companies' applications, it's reasonable to consider that vendor's data integration tools as well.
SAP Data Services and SAS Data Management Platform both provide extensive data integration capabilities that support large enterprises. SAP Data Services, although limited to working with SAP's business applications, is increasingly becoming more tightly integrated with the company's software portfolio. This means that enterprises that are already SAP customers should consider this integration product. Likewise, SAS customers that are using the company's statistical and analytical products should consider SAS Data Management Platform.
Tools for midsize enterprises with deep integration needs
Midsize enterprises generally have the following characteristics:
- A variety of source systems that handle overlapping data subjects and that may be on-premises or cloud-based. Data volumes will vary based on industry or the products or services offered. Structured data sources are still dominant, and any unstructured data that needs to be integrated is generally limited in scope.
- Extract, transform and load (ETL) and data warehousing are the dominant integration use cases, although application integration may arise in the future if data warehousing is addressed.
- IT budgets are constrained.
- A small IT group to perform both data integration work and business intelligence development. Hiring specialists dedicated to specific tools may not be fiscally possible.
Although midsize enterprises with this profile have significant integration needs, they're operating with constrained resources in regards to people, budget and time. These companies should consider data integration products from Microsoft, Oracle, Information Builders, Talend or Pentaho. Each of these tools provides capabilities to address the data variety, scope of integration uses and resource constraints typically found in such organizations.
Enterprises using Microsoft SQL Server that have developers with deep SQL expertise should consider Microsoft's data-related products, such as SQL Server Integration Services (SSIS). These tools share a common development approach, enabling IT to work with multiple Microsoft tools more effectively. Microsoft has been expanding the capabilities of SSIS to handle more complex integration use cases, such as slowly changing dimensions and fuzzy lookups, and a variety of data sources beyond flat files and relational databases. Although Microsoft's sources and targets aren't limited to its platform, deployment still remains limited to Windows. Microsoft's tools have historically been on-premises, but the company has made significant strides in moving capabilities to the cloud. On the down side, SSIS lacks some of the robust integration transformations, workflows and process management of its competitors, such as the ability to track and manage processes using a repository or team-based development administration functions.
Similar to Microsoft, enterprises currently using Oracle databases may wish to consider Oracle Data Integrator. ODI is a robust data and application integration tool that can handle a wide variety of data sources and integration uses, including BI, MDM and application integration; it also enables scalability in regards to data volumes and velocity. While the product has numerous capabilities that can be leveraged, it's often used to automate SQL scripts. ODI does require sufficient training to handle its somewhat complex implementation. The product's ability to work in conjunction with a variety of Oracle products expands its capabilities, but it also increases the complexity of deployment, making it difficult to use for IT staff with limited resources.
Information Builders' iWay Integration Suite can handle complex integration uses such as MDM, data cleansing and data governance. iWay should be considered when an enterprise is already using other information Builders information products, as it offers tight integration with those products. These tools are known for their scalability and ability to work in real-time with operational systems. One drawback: There's a limited pool of expertise and experience with this product.
Talend's and Pentaho's namesake data integration tools can also handle a variety of integration uses. Both products have open source versions that enable an IT group to avoid any up-front licensing costs. The open source versions offer solid data integration capabilities that fit well for enterprises that don't have demanding integration needs or for IT groups that are working on a shoestring budget. The enterprise versions of both of these companies' products provide significantly more extensive capabilities.
What to consider for small enterprises with demanding integration needs
Smaller enterprises in this group generally have the following characteristics:
- A variety of source systems with primarily structured data sources.
- ETL and data warehousing are the integration use cases.
- IT budgets are very limited.
- Limited IT staff that multitask in such areas as data integration, BI and operational systems.
These enterprises may want to consider either data integration tools based on the databases they already use -- i.e., Oracle or Microsoft -- or the products from Talend or Pentaho. These tools are cost-effective, as SSIS comes bundled with SQL Server and the open source versions of Talend or Pentaho provide more data integration capabilities than many enterprises even need. One caveat: Smaller enterprises should ensure that their IT department has sufficient expertise to leverage these tools effectively.
Tools for small enterprises with limited integration needs
These enterprises are primarily doing operational reporting directly from their source systems and aren't creating a data warehouse to integrate those source systems. Under these circumstances, these enterprises generally won't invest in data integration tools or IT skills. Instead, IT will rely on whatever is bundled with existing applications or do custom SQL coding. Business users will rely on the reporting built into their operational applications and use spreadsheets to fill the gaps if they need data from multiple applications for reporting.
As data volumes continue to grow, so does the need to integrate and translate this data into relevant information that can generate actionable insights. Now, you will be able to make a more informed choice when purchasing a data integration tool.
Healthcare agencies face costly data integration challenges
Tips on integrating and cloud and on-premises applications
A data integration strategy for big data environments