James Thew - Fotolia

Some early adopters take it easy on Spark cluster rollouts

Software companies Intuit and Novantas took deliberate approaches to deploying their first Spark clusters, limiting initial user access and looking for solid business uses in a 'surgical' way.

Intuit Inc. has set up a pair of Spark clusters, initially for use in analyzing clickstream records from its websites and data entered in online forms by users of its finance and accounting software. But Bill Loconzolo, vice president of Intuit's data engineering and analytics team, doesn't plan to rush into things with the open source data processing engine.

Loconzolo said Spark appears to be solid for the uses he has in mind. To begin with, though, the Spark cluster setup was experimental in nature, available for use by the data scientists on his team and by a separate advanced technology group -- but not by analysts in the Mountain View, Calif., company's business units. Loconzolo said he doesn't plan to open up the systems for broader use until the end of 2016.

That's in keeping with a take-it-slow approach he has been following in building out a big data analytics architecture centered on Cloudera Inc.'s Hadoop distribution. Loconzolo said he tries to run new technologies like Spark in trial mode for at least six months to make sure that they're ready to go -- and that Intuit is ready for them.

"That's kind of a lesson we've learned from what we've gone through the past few years" with other big data technologies, he explained. "Sometimes, early exposure [to users] is the worst thing you can do."

Kaushik Deka, CTO and director of engineering for Novantas Inc.'s technology unit in New York, said his team adopted a "crawl, walk and run" strategy when it began working with Hadoop and Spark in mid-2015.

"We had never used a big data platform as of a year ago," Deka said. "We're totally on board with technologies like this, but it's a lot of effort and a real culture change within an organization." Novantas also had to build up internal expertise on tools like Spark, primarily by retraining existing workers.

To avoid going off course, the company was "very surgical" about finding a solid initial business use for the big data technologies, Deka said. The search culminated when one of the banks that use its analytics services and software asked for help in combining different data sets to support predictive modeling on how individual customers would respond to marketing offers. Spark was a good fit for that application as an engine for extract, transform and load data-integration jobs, according to Deka.

Gartner analyst Nick Heudecker said the consulting company is getting a "substantial" amount of inquiries about Spark from clients. But the technology is still maturing, and production Spark-cluster implementations by corporate users remain relatively uncommon, Heudecker added. "Clearly, there's interest," he said. "Whether that translates into deployments is something we're watching closely."

Next Steps

Sellpoints taps cloud-based Spark cluster to process online activity data

An expert looks at Cloudera's Hadoop distribution

How to choose between Hadoop clusters and a data warehouse

Dig Deeper on Big data management