kentoh - Fotolia
Data management vendor Talend has updated its main data integration platform with support for Apache Storm and Apache Spark frameworks, which extend batch-oriented Hadoop technology for use in real-time, operational BI projects. The additions take the Talend offering further from its original roots in batch-oriented ETL.
Spark and Storm frameworks promise to bring about business analytics for commercial web sites that offer products in time. That arrangement is in comparison to traditional BI systems that create static reports on sales activity, according to Rupert Steffner.
"We think big data needs a new paradigm, with more automated 'decisioning,'" said Steffner, chief BI platform architect at the Otto Group, Hamburg, Germany. Looking ahead in e-commerce, the main purpose of analytics will not be for reporting, but instead will be for automated systems that "act on the data," he said.
Such capabilities are important for a company like the Otto Group, which competes with Amazon.com in a technology-infused commercial space full of online recommendation and personalization software engines.
The Otto Group began as a mail order firm in the late 1940s, but entered e-commerce in the 1990s. It now runs many retail websites, with holdings that include home-style company Crate & Barrel. Its 2013/2014 fiscal revenues were €6 billion, according to Reuters. But it is second to Amazon in online European sales, and looking to gain ground.
Shopping at the technology cafeteria
An objective is, for example, to create systems that reduce the number of abandoned online shopping baskets on an e-commerce site. Systems can make predictions about how likely users are to actually buy items, said Steffner, and to possibly alter standard offers in order to stimulate customers to purchase. That effort requires more than off-line, after-the-fact analytics.
With Hadoop 2.0-influenced technologies like Storm and Spark, big data can move beyond the realm of the lone data scientist mining for nuggets, said Steffner. Instead, applications can more systematically retrieve and work on relevant data.
He said the Otto Group has worked on pilot projects with open-source Storm and Spark software ("We like to go into the technology first, and then re-integrate"), but wants to work with the Talend platform to create large-scale production versions of e-commerce applications.
"For us, Talend plays a very central role," said Steffner. "It is our strategic tool for 95% of all data integration. All our ETL and ELT is done with Talend. We also license the [Talend enterprise service bus], which we use for messaging."
Messaging and the Internet of Things
The Talend data integration offering has evolved over the years to become a combination of ETL, messaging, management and, more recently, Hadoop. The company has worked on tuning Hadoop MapReduce performance, and it estimates Talend 5.6 users will see almost 60% gains in some cases, as well as 20-times better profiling performance overall.
Besides Spark and Storm updates, Talend 5.6 also improves its master data management capabilities to enable easier data model changes, as well as better lineage tracing and matching.
Talend's enterprise service bus has come in for updates too, according to Ashley Stirrup, chief marketing officer at the company. Talend 5.6 introduces support for MQTT and AMQP messaging protocols, which are both finding use in Internet-of-Things applications for which field sensor data is becoming part of companies’ data workflows, Stirrup said.
Talend 5.6 also adds application connectors to support Oracle GoldenGate, Microsoft HDInsight, Salesforce.com and NetSuite integrations. Version 5.6 of Talend commercial subscription products is available this week.
Check out these Hadoop 2 FAQs
Learn about Hadoop 2's impact on big data
Find out about best practices for Hadoop in production