Joshua Resnick - Fotolia
Hadoop and Spark frameworks must add manageability before gaining mainstream credibility. That challenge was less the case when Hadoop and Spark applications ran as tests or proofs-of-concept overseen by developers or data scientists; it is more the case as they run in greater numbers or alongside existing apps managed by system admins in data centers.
But management software for the emerging big data frameworks is still rare. Among the handful of companies dedicated to building software for performance management of Hadoop and Spark systems, count Concurrent Inc.
The San Francisco-based company last week announced general availability of its Driven 2.0 monitoring and managing software for Hadoop. There is support in Driven for Spark monitoring as well, although that part of Driven 2.0 is still in beta.
Concurrent's approach is built around application processes, according to Mike Matchett, senior analyst and consultant at Hopkinton, Mass.-based Taneja Group. That focus will prove important as more Hadoop-ecosystem components go into production.
"It provides monitoring of the workflow. That becomes useful on a big data platform if you're creating complete process workflows that spread out, and which may come back together, then you'll need software like Concurrent's to see how that execution pipeline is going," he said.
This option is helpful as Hadoop and Spark data professionals confront "a myriad of log files" and need to identify a processing bottleneck, Matchett said. Underlying Driven is Cascading, a popular open-source application development framework for forging big data applications. Cascading originators are among Concurrent's technology leadership, but the commercial effort for the vendor centers on Driven application performance software, according to Gary Nakamura, CEO at Concurrent.
Nakamura said Concurrent undertakes application-level monitoring by instrumenting applications with software agents that roll reports back up to administrator consoles. He said the data can be used to track Hadoop, Hive, MapReduce and Spark systems' adherence to service level agreements (SLAs), a common mark of IT operations that is still new to Hadoop. He said the software utilizes readings garnered by complementary software that may be in place.
"We put one agent on a Hadoop cluster and that is it," he said. "The whole idea is to capture info around the application, the business and other context. That way, the enterprise can operate a very efficient, high-fidelity data management strategy."
Gartner analysts and others have remarked upon apparently slow mainstream uptake on Hadoop. As tools for application monitoring become more widely used, the uptake could gain more momentum. Among other vendors taking varied approaches to the problem are startups such as Pepperdata, established performance management players like Oracle, IBM and CA, and Hadoop distribution providers themselves.
Learn about Spark's uses for data preparation.
Discover new ways of executing SQL on Hadoop.