New parallel processing platforms in the growing big data ecosystem are enabling organizations to bring greater compute power to bear on analytical problems. And machine learning applications are likely to be among the leading uses for systems based on big data technologies such as Hadoop and Spark.
The combination of parallelism and machine learning interests data scientists at companies like Allstate Insurance, Cisco Systems and Pandora Media. They want to build complex machine learning models and run the models repeatedly to fine-tune the algorithms and improve the results -- and they want to do that work as quickly as possible so they can handle a greater number of analytical problems, according to presentations and discussions at the Strata + Hadoop World 2015 conference held recently in San Jose, Calif.
In this Talking Data podcast, SearchDataManagement's Jack Vaughan, who covered the conference, tells colleague Ed Burns that people are coming to machine-learning applications from a couple different points of view. One group includes data analysts and programmers at e-commerce websites who want to serve up recommendations to visitors. Another includes enterprise statisticians who have been immersed in the technology for years but haven't had the processing power needed to move beyond relatively simple models.
Vaughan says the latter group now faces the task of running more jobs that are based on sophisticated statistical models -- ones that can find patterns in data that can be turned into competitive business advantages. With the new parallel processing platforms, they have the compute power to scale up, handle greater amounts of data and, hopefully, be more successful in their predictive analytics efforts. At the same time, new tools in the big data ecosystem seek to streamline the machine-learning programs that have so far required programmers with specialized parallel computing skills.
The interest in machine learning shouldn't obscure the fact that SQL queries are also being applied to the growing pools of big data that organizations are accumulating. But Vaughan says much of the buzz at the Strata conference was around analytics applications of the machine learning kind.
Trace the expansion of the big data ecosystem for Google Cloud Platforms
Video: Expert reveals why building a big data ecosystem isn't a one-size-fits-all idea
Not many AI applications live up to the name
How the artificial intelligence push will influence data management changes