agsandrew - Fotolia
Hitachi Vantara pushed its DataOps vision forward with the release of the Pentaho 8.3 platform. The new release adds features that improve data visibility and integration with multiple cloud services, including Amazon Kinesis for real-time data streaming.
Pentaho is a data integration and analytics software platform, with capabilities that help to enable enterprise data management. The Pentaho 8.3 update, released July 10, is the first major release of the platform in 2019 and helps to further integrate the platform with other key elements of the Hitachi Vantara portfolio.
Pentaho was acquired by Hitachi in 2015 and initially was left to operate as a standalone unit. In 2017, Hitachi combined the operations of its Hitachi Data Systems, Hitachi Insight Group and Pentaho into the Hitachi Vantara operating unit.
A key theme that Pentaho's product team has been advocating is the idea of DataOps. Arik Pelkey, senior director of product marketing, explained how the vendor sees DataOps as data management for the artificial intelligence era.
"One of the biggest challenges when it comes to AI is the data management side of things, and that's where we excel," Pelkey said. "Data management is the key part of DataOps; it's all about minimizing the friction points between data and insight."
Those friction points include managing data across disparate sources from the edge of the network up into the cloud. Pelkey said there are often many manual tasks involved with enabling data for AI as well as business intelligence and analytics efforts.
Pentaho 8.3 reflects DataOps rising in data management
Stewart BondAnalyst, IDC
DataOps isn't just a marketing term used by Pentaho; it's a real approach that is also starting to be used by other vendors in the data integration and data intelligence software market, according to IDC analyst Stewart Bond. In his view, DataOps takes many of the same concepts from DevOps, applying it to data.
"Data is the lifeblood of digital transformation and apps are no longer function-oriented but data-oriented," Bond said. "As new data applications are being created, applying DevOps-like methods is a natural fit."
He added that DataOps is not just about delivering data applications, but also taking advantage of data intelligence to enable organizations with data, making sure the right data is available to the right resource at the right time and that data is being used for the right reason.
"Data intelligence is also informing data operations, optimizing data locations and access paths based on usage patterns, quality scores and inherent value," Bond said.
Moving beyond Hadoop
Matt Howard, senior director of product management at Hitachi Vantara, explained that many organizations buy Pentaho for its data integration capabilities. Howard added that Pentaho is able to take its rel="noopener">Pentaho Data Integration capability and its analytics data pipelines and combine those with an object storage capability. The move to object storage is about looking beyond Hadoop for big data management and analytics.
"Over the last five or six years, when we talk about big data and complex data challenges, you're always thinking about Hadoop," Howard said. "One of the things that we're doing at Hitachi Vantara right now is a sort of shift from Hadoop to object storage-based data lakes."
The object storage component comes from the broader Hitachi Vantara platform with a technology called the Hitachi Content Platform (HCP). The combination of Pentaho and HCP streamlines DataOps, according to Howard.
Howard said that Hitachi Vantara is now selling Pentaho and HCP together. He noted that HCP includes rich capabilities for metadata, which becomes more powerful when an organization wants to be able to search and build search applications on top of the object storage.
Pentaho first enabled an HCP connector in its 8.2 update and has updated it now for the Pentaho 8.3 release. Howard said the HCP connector in the Pentaho 8.3 update was enhanced to be able to read and write data in a more streamlined way.
Among the key enhancements in Pentaho 8.3 is the platform's data streaming integrations, notably with Amazon Kinesis. Howard commented that the Kinesis integration is in addition to Pentaho's existing streaming service connectors for Apache Kafka, AMQP, JMS and MQTT.
"We have a full suite of streaming connectors, as well as the ability to process streaming data, natively within our ETL [Extract, Transform and Load] engine or using Apache Spark," Howard said.
In terms of the cloud, Pentaho is helping users onboard with the cloud and then move data within the cloud. So within Amazon Web Services, that means loading Simple Storage Service (S3) and then moving to Amazon's Redshift data warehouse or to the Amazon Elastic MapReduce service.
"Amazon's got an amazing array of capabilities, but it's not very well stitched together," Howard said. "Pentaho provides a visual development environment with a drag and drop GUI [Graphical User Interface] for designing pipelines."
Another new capability in Pentaho 8.3 is an enhanced connector for SAP HANA. Howard noted that Pentaho had previously only provided a bulk loader for SAP HANA data, as well as more connectors for loading an SAP data warehouse.
"This new connector is for connecting to the applications," Howard said. "So not just loading the analytics tools, but actually pulling data from SAP applications."
Howard users can access SAP data in many different ways and the idea behind the new connector is to provide more direct access to the objects in the SAP object layer, instead of having to go through intermediate steps.
Looking forward, the plan for Hitachi Vantara is to continue to advance the Pentaho platform in upcoming releases.
"We're in the process of really modernizing our platform and moving on to microservices and containerized deployment," Howard said. "That's not just for the Pentaho product but across the Hitachi Vantara portfolio."