If big data is to be a big success, it needs to be made available to larger groups of end users. But widely used business intelligence tools are not yet able to easily analyze the biggest of big data, and working on excerpts of data is often still the norm.
A slew of BI on Hadoop tools arrived a few years ago, but few particularly stood out. But, as experimental Hadoop data applications moved to production, the quest to deliver analytics beyond a few data high priests continued.
That has attracted quite a number of BI startups looking to be the business stars -- akin to Business Objects or Tableau -- of Hadoop and big data by using BI on Hadoop. Arcadia Data Inc., AtScale, Datameer, Dremio, Pentaho and others have entered the void in the effort to make big data accessible to more people.
Analyzing data as-is
Soon, that will mean bringing BI to where the data is rather than bringing the data to where the BI tools are, according to analyst Boris Evelson, a Forrester analyst.
"Most modern BI platforms still bring data to BI, where BI tools and databases reside on different platforms and data needs to be optimized for BI," said Evelson, who is working on a series of reports on BI modernization.
Evelson counts Hadoop and Spark among the modern platforms that can support both database management system (DBMS) and application platforms. Now, he said, users can run data on the same platform as the one on which the DBMS is located, and that offers business benefits because analyzing data as-is rather than analyzing data after it is processed and moved can extend BI use cases.
Data mining meets mineral mining
Count Komatsu Mining Corp. among the companies looking to churn more data in place and share BI analytics of that data more widely, both within and outside of the organization.
To improve efficiencies, the Milwaukee-based maker of mining equipment has combined a wide range of tools and repositories, including Hadoop, Spark, Kafka, Kudu and Impala software from Cloudera, as well as on-cluster analytics software from BI on Hadoop analytics toolmaker Arcadia Data.
This modern platform has been assembled to, among other things, analyze sensor data gathered by equipment in the field to track wear and tear on massive shovels and earthmovers, according to Jason Knuth, senior manager for data solutions at Komatsu Mining, which is part of the Tokyo-based conglomerate Komatsu Ltd.
Like other companies, Komatsu foresees a future in which IoT application data will enable better predictive and prescriptive equipment maintenance.
Big data streaming analytics is a key to that quest. And the stream flows quickly. Knuth said incoming time series data points can nominally clock in at 200,000 per second, and that peaks of 1 million data points per second are not uncommon. This data is a byproduct of operations, but it can give insight into the future.
"You have to learn to understand failure modes and the lifecycle of the machine," said Knuth in an interview at the recent Strata Data Conference in New York.
Project, not predict
Knuth has long been determined to understand how such processes function. He worked with agricultural machinery growing up on a Wisconsin dairy farm. He later studied and worked in mechanical engineering, with a special emphasis on remote monitoring and failure analysis.
This led him to a role in data science and predictive analytics at Komatsu. In the context of his work, he prefers the term project over predict.
"We take machine data to project the future state," Knuth said.
Most recently, Knuth has been striving to put data and analytics in the hands of more customers, as well as in the hands of more of his Komatsu colleagues who work across six regions around the world.
"Our goal is to democratize the data; to get the information to the people," he said.
Komatsu's focus has been on a so-called virtuous information triad, Knuth said, which connects machine, analysis and people.
It is still early for BI on Hadoop applications, and Knuth has worked closely with the software vendor to tune its Arcadia Enterprise to quickly derive visual analytics from this native, in-cluster analytics engine.
Knuth commended Arcadia for closely collaborating with his teams to reduce latencies for concurrent users. Such latency has been a target for most BI on Hadoop vendors who must ultimately match levels of performance that SQL analytics engines have built up over many years.
Knuth said his teams are building BI dashboards and prioritizing data handling, as some events generate alarms that must be handled as quickly as possible, while others can be handled at a more leisurely pace. As always, individual users' notions of what is real time can vary.
Dawn of AI infusion
Despite an industry-wide push toward AI-infused applications, analytics delivery issues for multiple concurrent users are important, and they must still be addressed, according to Steve Wooledge, vice president of marketing at Arcadia.
"You can make pretty pictures, but if it is just for five users, that is not that useful," Wooledge said.
He noted that interactive queries of streaming data are becoming part of the big data analytics mix. Earlier this year, Arcadia released the Arcadia Instant software, which helps provide visualizations of native Apache Kafka data in motion. It does this by supporting a SQL dialect created by streaming company Confluent and by targeting real-time data.
For his part, Knuth said Kafka and Kubernetes are becoming keystones of the company's analytics platform. But again, it is early. Knuth said his team has just begun to play with the KSQL data engine, but he finds it promising.