BACKGROUND IMAGE: kentoh/Fotolia


Big data poses weighty challenges for data integration best practices

Get started Bring yourself up to speed with our introductory content.

What are key features for choosing the best ETL tools for your needs?

Choosing the right ETL tool for your data integration requirements can be a challenge. Here's a rundown on what to look for in ETL software and potential vendors to consider.

Extract, transform and load (ETL) is the most prevalent data integration process used by IT and analytics teams....

ETL jobs extract data from source systems, transform it for downstream processing and analytics uses, and load the transformed data into target systems -- for example, an enterprise data warehouse or, in big data applications, a Hadoop cluster.

ETL tools automate these operations, which can streamline the ETL process compared to hand-coding integration scripts in SQL or other programming languages. However, manual coding is still a widespread practice among data integration developers, data analysts and other users with technical skills. If your organization is looking to get away from manual coding, the good news is that choosing the best ETL tools to meet your data integration needs has become less of a hit-or-miss proposition as the software has matured.

Rick Sherman, founder of consulting company Athena IT Solutions in Maynard, Mass., noted in a 2012 Q&A with SearchDataManagement that ETL software in particular and data integration products as a whole have evolved significantly since first emerging in the 1990s. The initial ETL tools to hit the market moved data in batches "from point A to B, and didn't provide very much functionality, but that's no longer true," Sherman said. For starters, he pointed to the addition of predefined integration routines, ETL process and code management features, and built-in documentation of integration procedures.

ETL tool features to look for 

As evidence of this evolution, Sherman and other data management consultants point to a mix of core functions that most ETL vendors now offer -- and that users will typically want to have at their disposal. The list includes the following:

  • The ability to connect to and extract data from a variety of sources -- databases of all stripes (relational, NoSQL, etc.), big data systems built around technologies such as Hadoop and Spark, flat file repositories, application-to-application message queues, and more.
  • A GUI-based design environment that supports drag and drop development of source to target mappings and ETL workflows.
  • Team-based development capabilities for collaborating on integration projects, with associated version control and release management features.
  • Basic data transformation functions, such as data type conversion, date reformatting and string handling, plus data mapping and workflow orchestration capabilities.
  • Built-in data profiling software that can analyze source data for consistency, dependencies and other attributes before beginning the ETL process.
  • Data quality and cleansing functionality for identifying and fixing errors in data sets, plus data synchronization for keeping data consistent in source and target systems.
  • Metadata management support for synchronizing integration processes and documenting data transformation and business rules.
  • A job scheduler, along with process management controls that provide things like runtime monitoring and error alerting, handling, and logging.

Another common characteristic of modern ETL tools is support for integrating data stored in both on-premises and cloud systems, including hybrid cloud environments that mix the two types of platforms.

Most vendors now also provide the ability to flip the last two stages of the ETL process and instead do extract, load and transform (ELT) integrations. The ELT approach lets users manipulate and transform data after it has been loaded into a target system; that capability is a particularly good fit for big data applications in which large volumes of raw data are often loaded as is into the Hadoop Distributed File System, NoSQL databases or other repositories, and then filtered as needed for different analytics uses.

ETL tools grow into integration suites

Traditional ETL is still the biggest part of the data integration market by a wide margin, according to Gartner -- in its "2016 Gartner Magic Quadrant for Data Integration Tools" report, the consulting and market research company said that "easily more than 80% of all organizations" are running a significant amount of batch jobs involving bulk integration of data. In many cases, though, vendors that started with ETL tools augmented them with other types of technologies to create broader data integration suites.

In addition to batch ETL processes, the suites generally offer data replication, change data capture and other forms of real-time integration; data virtualization software that provides integrated views of data without physically extracting it from sources and loading it into a central data warehouse or other target system; streaming and event processing tools; and application integration functionality via service-oriented architecture and enterprise service bus technologies.

There are dozens of vendors you can consider in your search for the best ETL tools to handle the data integration jobs your organization needs to run. Major vendors selling full integration suites include Informatica, IBM, SAP, Oracle, SAS, Microsoft and Information Builders. Also available, potentially at a lower cost, are open source ETL and integration platforms from companies such as Talend, Pentaho and TIBCO Software's Jaspersoft unit.

In addition, the market contains numerous smaller vendors, including ETL specialists and companies that focus on specific vertical industries or certain types of integration applications. Another option is data warehouse automation software, which generates integration code as part of the data warehouse development process without requiring a separate ETL tool or server -- although some ETL proponents contend that such products aren't as functional as mainstream data integration tools.

Next Steps

Get more advice in our guide to buying ETL and other integration software

More users should use data integration tools, consultant Rick Sherman says

Learn why Pfizer replaced ETL tools with data virtualization software

This was last published in December 2016

Dig Deeper on Extract transform load tools



Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Does your organization use commercial ETL software for data integration? Why or why not?
It's not surprising your ETL guys advise you to steer clear of code-generating ETL tools! They would all be out of a job! The ONLY time you ever need to debug code-generated ETL code is when some fool has played around with it, or tried to perform some tasks manually when they ought to have been handed over to the code-generator.

There is NOTHING wrong with code-generation tools - they deliver results faster, the code is error-free and optimised, and they are self-documenting.

I don't see a downside here at all.

ETL tools on the other hand have a much worse track record of data warehouse delivery. They are slow to deliver, require more resources and additional tools to get the job done.

It is quite well proven now that Data Warehouse Automation tools like BIReady, Kalido, WhereScape and others get the job done in a fraction of the time and cost of ETL tools.

Go tell that to your ETL guys!
Your readers may also find real user reviews for all the major ETL tools on IT Central Station, such as Adeptia, Pentaho or Oracle Data Integrator, to be helpful when making their selection. As an example, this ODI user writes in her review, "The tool has also made metadata management more efficient, and automated the process from relational source to Hyperion Essbase." You can read the rest of her review, as well as explore what others have to say about ODI and other ETL solutions, here:
Editor's note: This article has been expanded and revised to bring it up to date.
Most of the discussion around ETL tools is around the data migration. Data in the sens of database data. What about ETL for unstructured files. We have developed tooling for the migration of fileshare or DMS files to e.g. new DMS. During the ETL the content is checked for relevance, converted and all sorts of other nice things. Would this also classify as an ETL tool? 
Hi, odubbeldam. Yes, I think a tool of that kind would qualify as ETL software. If it only handles unstructured files, I'd view it as something of a specialized ETL tool -- as opposed to ones that can deal with a mix of structured, unstructured and semi-structured data. What do you plan to do with the tool?