Selecting the right data integration products is critical to meeting the increasing demand in companies for data that can help drive more informed business decisions. The tool you choose to integrate and translate this data into information that can generate actionable business insights must fulfill your organization's requirements. Otherwise, it will become expensive, unused shelfware. Even worse, custom manual coding of integration scripts -- with all its downsides -- will prevail.
The data integration product evaluation process starts with gathering and prioritizing requirements such as source and target systems, the types of data you have to pull together and the forms of integration that will be needed. There can be a lot of variables in those requirements. For example, you may have a mix of structured and unstructured data to integrate. And the data integration platforms now offered by vendors support a variety of integration use cases: extract, transform and load processes; application integration; cloud-based and real-time integration; data virtualization; data cleansing; and data profiling.
Once you have the requirements in hand, you can move on to creating a list of specific features and functions to evaluate products against. Ultimately, your organization needs to select the data integration tool that's the best fit for its use cases and budget, and one that can be implemented given your enterprise's resources and skills -- not necessarily the most feature-laden product, or one deemed the best by industry analysts.
Data integration product evaluation and selection criteria
To simplify your selection process, classify the list of features and functions you put together as must-haves, should-haves, nice-to-haves and will-not-use items.
Must-have features should be unambiguous; if a product doesn't have them, it should be eliminated from further consideration. Should-have features occupy a gray area between absolutely must-have and merely nice-to-have features, where certain capabilities can have a major impact on integration productivity, scalability and maintainability. Although nice-to-have features aren't required, they're often the differentiators in selecting a product. On the other hand, if a data integration product includes features that aren't going to be used, don't waste time examining those components of the software.
When determining whether a product has a particular desired feature, sometimes the answer is, "Yes, it meets the criteria, but ..." The buts include such things as custom coding is required; an add-on product, possibly from a third party, needs to be purchased; only specific product editions have that feature; or the feature is slated to be added in a future release. These exceptions generally mean additional time and expense are required for that product to meet the criteria. The evaluators need to determine how to handle such products, both to ensure an objective evaluation and to avoid surprises if one of them is selected.
An obvious response if a required feature is lacking is that you can do custom coding to fill in the gaps. But beware: As part of the product evaluation process, you must estimate the cost of the coding that would be needed in terms of time, resources and opportunity loss, and then assess whether it's better to just forgo missing features -- or choose a different integration platform that offers them.
Compiling your list of data integration features
Each company's laundry list of must-have items will differ based on its detailed requirements. But the following core capabilities are generally considered must-have data integration features for most organizations:
Access data from a mix of sources. The chosen data integration product needs to directly access various data structures and types of information, including:
- Relational, columnar, in-memory and NoSQL databases, plus multidimensional online analytical processing systems and other specialized databases.
- Flat files such as tab-delimited, comma separated value or spreadsheets.
- Application messaging technologies such as enterprise messaging systems, extensible markup language and JSON.
- Industry-specific protocols such as Health Level Seven International and Society for Worldwide Interbank Financial Telecommunication.
- Enterprise application integration Web or data services.
- Business applications such as ERP and customer relationship management systems.
- Software as a service applications.
- Mobile applications.
- Unstructured data such as social media data, email, website-related data and documents.
- Proprietary data protocols to communicate with specialized sensors, devices and legacy systems such as mainframes.
Write data to target systems. Data integration software needs to be able to insert, modify and delete data in the target systems of integration processes -- for example, data warehouses or operational databases that combine data from various sources for transaction processing.
Interact with sources and targets. An integration tool must support a variety of data capture and delivery methods, including batch acquisition and delivery, bulk import and extract, and change data capture. Streaming and near-real-time data ingestion should also be a standard feature of integration software, along with time-based and event-based data acquisition -- the latter triggered by predefined processing events in databases or applications.
Transform data. Basic data handling features are crucial, including data-type conversions, date functionality, string handling, NULL processing and mathematical functions. The same goes for data mapping capabilities; such as join; merge; lookup; aggregate and substitute; and for workflow support, which enables the creation of an integration process with multiple source-to-target mappings that are potentially interconnected based on data or functional dependencies. In addition, integration software should provide workflow orchestration that includes looping, if-then-else, case style and passing variables.
Enable effective design and development. Another key data integration feature is a graphical design interface that simplifies the construction of source-to-target mappings and integration workflows with data, transformations and other elements displayed in design palettes. That should be accompanied by software development management functionality, such as version control; support for development, testing and production environments; and the ability to attach comments or notes. Data integration products also need to provide interactive testing and debugging functionality, and the ability to create reusable and shareable components.
Support efficient operations. Features for managing and optimizing integration processes are vital as well -- for example, run-time process monitoring; error, warning and condition handling; collection of run-time statistics; and security management.
Provide multiple deployment options. A data integration platform must support operating environments both on-premises and in the cloud, the latter through either hosted deployments or integration platform as a service offerings. Virtualized servers and distributed processing environments should also be supported across a variety of operating systems.
The following features aren't necessarily must-haves, but they can significantly enhance developer productivity in designing data transformations:
- Support for slowly changing dimensions, if used for business intelligence or data warehousing.
- Customized log, error and condition handling.
- Text string parsing and matching.
- Data set processing, such as time series and pivots.
Other features that could be classified as should-haves include support for team-based development and management, as well as release management for both integration processes and the data structures that are being used. Repository-based storage and access to process, or run-time, metadata is another, as it enhances the ability to analyze run-time performance to identify bottlenecks and trends.
More nice-to-have features include:
- Self-generating documentation with graphical representations of workflows.
- Where-used and what-if capabilities for analyzing the use of sources, targets and transforms.
- Data profiling tools to analyze the information in sources and targets.
- Data quality tools to cleanse and enhance data.
- Integration with other vendors' software development, management, scheduling and monitoring tools.
- Parallelization of integration processes and data loading.
Additional data integration tool selection criteria
The following are often included in evaluation criteria. But since they're subjective, it's important to clearly weigh their applicability and importance to your organization:
Loading performance. This will vary based on the integration complexity, source systems accessed and data volumes involved. The best practice is to create several prebuilt integration use cases and compare how each product performs on these specific examples.
Scalability. You should supplement the loading performance tests with stress tests that simulate anticipated growth in the number and size of your sources and targets.
Ease of use. This will vary based on the knowledge and skills of the data integration developers involved.
Training on the data integration product. This may include vendor in-person classes; online classes, live or pre-recorded; or Web recordings for specific features or processes.
Documentation and support. There should be separate criteria for developer online help versus technical documentation. How the vendor provides support -- online Q&A for common issues, online chat, in-person discussions and on-site -- and pricing of each should also be included in the evaluation.
Once you've created your evaluation criteria, it's time to select a short list of products and create your request for proposal.
Is data curation the next step in data integration?
Healthcare organizations require more sophisticated data analytics
How data integration and quality are essential to big data CRM
SAP Data Services offers data integration, transformation and more