photobank.kiev.ua - Fotolia
IBM highlighted its plans for the Apache Spark processing engine at its IBM Insight 2015 conference in Las Vegas this week, announcing the availability of a Spark analytics cloud service and showing off early applications combining the open source technology with other tools in its portfolio.
Originally announced in June, the Spark as a service offering, called IBM Analytics for Apache Spark, runs on the company's Bluemix cloud platform. It could serve as a workbench for users interested in finding specific areas to mine within their vast data lodes. For example, SolutionInc, a provider of public Wi-Fi and wired Internet access services based in Halifax, N.S., is using the IBM Spark platform to look for nuggets of useful information on network usage trends hidden in the operational data it's continually gathering.
In one case, Spark was fed more than 240 million rows of Wi-Fi log data to help pinpoint device traffic patterns across multiple locations, so SolutionInc teams could explore market demand. Using the Spark analytics service, with consulting assistance from IBM, "helped us understand the breadth of the insights from the data, so we can focus on how to package and present it," said SolutionInc CEO Glen Lavigne.
IBM is also in the process of placing Spark in systems ranging from its InfoSphere Streams event processor and DataWorks data preparation tools to its SPSS statistics package and zSeries mainframes. Its endorsement could help speed adoption of Spark, which arose in 2010 from development work at the University of California, Berkeley, and only became a top-level Apache project last year. The main Spark mover to date has been Databricks, a startup that was formed by the technology's UCal originators and made a cloud-based Spark platform available earlier this year. The top three Hadoop vendors -- Cloudera, Hortonworks and MapR Technologies -- also support Spark in their distributions of the distributed processing framework.
Just in the Spark of time
Spark supports fast distributed data processing and includes APIs that allow Spark application development in programming languages, such as Java, Scala and Python. It also comes with libraries for machine learning, data streaming, graph processing, and other analytics and data science needs.
At the Insight conference, an IBM executive attested to the relative ease of programming with Spark. IBM cut the code base needed for DataWorks from 40 million lines to 5 million, according to Rob Thomas, vice president of product development for IBM's analytics group. He also emphasized Spark's advanced analytics prowess.
"Until recently, this has been about providing a repository where businesses store data," he said in an interview. "Now, it's about how you drive insights out of the data."
Early on in the Spark analytics era
Still, at Insight 2015, one IBM presenter admitted that it was early on in the "Spark experience," and that much innovation, with all its complications, was still ahead. For users, those early phases naturally include a fair amount of tire-kicking and testing. Several attendees at the conference said they were just beginning to look at the technology.
For now, Spark is among the elements being considered for the technology roadmap at IT services provider Dimension Data, said Peter Gray, the company's director of analytics and information services. Gray and his colleagues are using IBM Streams as part of an Internet of Things application that will instrument competing Tour de France cyclists to collect data for analytics uses during the race. "Spark is something we can look at moving forward," he said.
While he has no specific Spark initiatives underway, Mark Beeson, manager of Web services at Skechers USA Inc., sees the technology as a potential fit in efforts by the shoe maker to analyze data. Beeson currently is leading a project to build an ''omnichannel'' service that unites in-store and website data streams.
For the future, Spark's Scala support could give it inroads into the company. "The Sketcher team is what can be described as a big Scala shop," Beeson said. "The fact that it has APIs based off of Scala is really attractive to us."
Any way the Spark wind blows
IBM's decision to launch the Spark analytics service is a bow to the fact that early Spark, just like Hadoop before it, can be a challenge to deploy and administer. Even companies with significant big data infrastructure and experience may find the cloud service aspect useful on Spark. Take The Weather Company (TWC) as an example. In the age of the smartphone, it now spins out billions of location-specific weather forecasts every hour -- and it uses big data tools, such as Apache Spark, Hadoop and Cassandra, to do it.
Before word came out this week that IBM had agreed to acquire the company's digital and data assets, David Kenny, TWC's chairman and CEO, said in an interview at Insight 2015 that IBM's move to offer Spark as a cloud service could be beneficial in the area of elastic computational scalability. "It's intriguing to us to get Spark as a service because we are so cloud-based," Kenny said. "It has real potential to scale."
On the acquisition, IBM said it plans to use TWC's cloud data platform to help broaden its Watson analytics offerings. The agreement excludes The Weather Channel, TWC's flagship broadcasting operation, which will license weather forecast data and analytics services from IBM. Terms weren't disclosed; IBM said that, pending regulatory approvals, the deal is expected to close in 2016.
Learn how weather analytics can help agriculture
Find out how major vendors are pushing Spark analytics forward