Joshua Resnick - Fotolia
Recent big data management and analytics activity includes the release of an updated Hadoop distribution by MapR Technologies Inc. Enhancements in the new 4.1 package target issues the Hadoop framework may have when it needs to be deployed across multiple data centers. In a related step, MapR has aligned with other vendors to improve the lot of Hadoop processing jobs that must run alongside other types of workloads in enterprise data centers. Also fresh out of the new-product pipeline is a machine learning service for Microsoft's Azure computing cloud. Included are development interfaces that, according to the company, treat the Python language as "a first-class citizen."
Data-centric Hadoop doings of the MapR variety
Hadoop vendor MapR said the new version of its Hadoop software distribution includes a POSIX client, an API for C developers and cross-data-center table replication for MapR-DB, its NoSQL database. The 4.1 release follows a move by the company to ally with others to work on the Myriad open source project, which is intended to provide tools for efficiently consolidating big data workloads in data centers.
The MapR upgrade's table replication capability targets Hadoop users now reaching a stage where the open source distributed processing framework is deployed beyond the confines of a single data center.
The configuration of the MapR Hadoop 4.1 distribution simplifies data replication, which can be labor-intensive without effective automation, said Manny Puentes, a MapR user and an early evaluator of the new release. Puentes is CTO at Altitude Digital, a Denver-based online advertising platform developer.
Cross-data-center replication is important for a company like Altitude Digital, which looks to quickly serve online ads to website visitors based on user profile data that is accumulating in greater and greater amounts. "If you can hit any data center and get the same information, that helps with real-time analytics," said Puentes, who also has implemented MapR's software in earlier positions at other companies.
Meanwhile, Project Myriad takes yet another tack in looking to improve Hadoop for data center uses, according to Jack Norris, MapR's chief marketing officer.
The software, now in the early stages of development, will connect Hadoop's YARN cluster resource manager and job scheduler with Apache Mesos, an emerging management framework that enables multiple workloads to run side-by-side in a data center. The Myriad effort is the result of joint work by MapR, eBay and Mesosphere, which makes a distributed systems kernel based on Mesos.
Norris said Myriad will allow Hadoop users to run YARN -- an essential component of Hadoop 2 -- along with Mesos. That combination could be useful if Hadoop is to find broader application in mainstream computing uses.
Microsoft Azure leans toward machine learning
Azure Machine Learning is now generally available for Microsoft cloud computing users. It is billed as a managed cloud service for advanced analytics, and Microsoft expects its first uses to be found in website personalization applications and predictive maintenance efforts for industrial machinery.
An important part of the Azure Machine Learning release are new tools for data scientists and software developers, according to T.K. "Ranga" Rengarajan, corporate vice president for data platforms at Microsoft.
"With this release, we are giving Python first-class treatment," Rengarajan said in an interview, noting the development language's growing popularity for analytics. He said Microsoft has fashioned many machine learning algorithms as part of its Bing search and Xbox game console initiatives -- they're now becoming available for the Python programmer community via Azure Machine Learning.
Another language finding its way into the Azure Machine Learning toolbox is R, an open source programming language for advanced analytics applications. The addition follows Microsoft's January move to purchase Revolution Analytics, a provider of commercial software and services for R.
"We're seeing an emergence of a culture of data," Rengarajan said. "Often it is data that we used to throw out. Our approach is to make this very simple for people to compose machine learning applications. There are many personas involved in making these applications work, from data modelers and data scientists to DBAs."
The machine learning news, which was announced at the Strata + Hadoop World 2015 conference in San Jose, Calif., was accompanied by word of a preview of Microsoft's Azure HDInsight cloud-based Hadoop platform that will run on the Linux operating system in addition to Windows. The company also announced general availability of the Storm streaming data processor on HDInsight.
Microsoft's machine learning tools on Azure are part of a host of newly emerging cloud-based applications. IBM, Google and Amazon Web Services are also counted as top-tier players pursuing this still-infant software type.
Listen to a leading analyst's take on the Hadoop space
Find out how Ancestry.com tapped Hadoop for machine learning