Metadata injection marks Pentaho big data pipeline

The crush of big data leads some data pros to seek more automation of data integration processes. The Pentaho software platform now offers metadata injection capabilities to help meet such needs.

Each quarter, the editors at SearchDataManagement recognize a data management technology for innovation and market impact. The product selected this quarter is Pentaho 6.1 from Pentaho Corp., a subsidiary of Hitachi Ltd.

Product: Pentaho 6.1 data integration and business analytics platform

Release Date: April 22, 2016

Version 6.1 of the software includes support for metadata injection options to dynamically create complex data transformations within diverse integrations.

What it does

The Pentaho platform combines data integration and business intelligence capabilities. The platform is intended to automate data loading in complex, modern environments. That means handling big data pipeline transformations of diverse data types, ranging from structured relational repositories to the more varied data types found in open source NoSQL, Spark and Hadoop stores, as well as social media, log and machine data streams. On the business intelligence side, Pentaho supports in-memory data caching for large data volumes, as well as interactive visual analysis of data, including geo-mapping, heat grids and lasso filtering.

Why it matters

High-volume, highly varied data in organizations has stressed traditional extract, transform and load (ETL) and data warehousing methods, often creating a backlog of work orders for the IT department. Pentaho's goal is to reduce reliance on IT, especially the need for hand-scripted ETL production, and to thus enable self-service analytics for data scientists and business analysts working in lines of business.

With Pentaho 6.1, data integration is accelerated via metadata injection, which derives transformation plans while inspecting source data for field names, lengths and other contextual clues, and then executes the plans automatically. The vendor had offered early and functionally incomplete versions of the technology in previous releases -- with 6.1, it officially launched the injection capabilities. The automatic insertion of metadata is seen as a potential time-saver on integration work as data volume and variety continue to grow in organizations.

What users say

The Pentaho platform can be deployed for data integration without undue recourse to consultants, according to Matt Good, managing director of software architecture at Kingland Systems, a software development services provider in Clear Lake, Iowa. "Pentaho was lightweight and easy to install. It was a good fit," he said.

He also gave high marks to 6.1's formal introduction of metadata injection. "With metadata injection, we can flexibly define how we load data and how we have it parameterized. Clients can use the self-service application to load data. But the order in which they load that data doesn't matter, because [metadata injection] provides the necessary indexing detail," he said.

According to Good, Kingland's use cases for Pentaho began with integration of customer data. He said the company primarily handles relational data as part of compliance and governance services it provides to financial and other companies, but that he and his teams are seeing increasing use of open source software such as Hadoop and Spark. Newer work that is under development with the Pentaho platform centers on self-service data quality, which is important for compliance and risk reporting.


  • Automated injection of metadata replaces static ETL techniques by accessing source metadata at runtime, passing it into a transformation template and automating repetition of the process.
  • Repository performance improvements speed browsing for JSON data types.
  • Improved data model generation for virtual table implementations helps create schema on the fly.


Pentaho doesn't disclose specific pricing for its products. The company offers both subscription and term pricing based on server cores, as well as Hadoop nodes when Hadoop processing is included.

Next Steps

Learn how to select the right data integration tool

Find out more about business users' tools for data preparation

Discover more about discovery-based architectures for BI

Dig Deeper on Enterprise data integration (EDI) software