News Stay informed about the latest enterprise technology news and product updates.

MapR Hadoop update plies data center waters; Azure learns new tricks

Hadoop vendor MapR's latest release puts the focus on database replication across data centers. Also, Microsoft has built a Python-friendly Azure service for machine learning in the cloud.

Recent big data management and analytics activity includes the release of an updated Hadoop distribution by MapR...

Technologies Inc. Enhancements in the new 4.1 package target issues the Hadoop framework may have when it needs to be deployed across multiple data centers. In a related step, MapR has aligned with other vendors to improve the lot of Hadoop processing jobs that must run alongside other types of workloads in enterprise data centers. Also fresh out of the new-product pipeline is a machine learning service for Microsoft's Azure computing cloud. Included are development interfaces that, according to the company, treat the Python language as "a first-class citizen."

Data-centric Hadoop doings of the MapR variety

Hadoop vendor MapR said the new version of its Hadoop software distribution includes a POSIX client, an API for C developers and cross-data-center table replication for MapR-DB, its NoSQL database. The 4.1 release follows a move by the company to ally with others to work on the Myriad open source project, which is intended to provide tools for efficiently consolidating big data workloads in data centers.

The MapR upgrade's table replication capability targets Hadoop users now reaching a stage where the open source distributed processing framework is deployed beyond the confines of a single data center.

The configuration of the MapR Hadoop 4.1 distribution simplifies data replication, which can be labor-intensive without effective automation, said Manny Puentes, a MapR user and an early evaluator of the new release. Puentes is CTO at Altitude Digital, a Denver-based online advertising platform developer. 

Cross-data-center replication is important for a company like Altitude Digital, which looks to quickly serve online ads to website visitors based on user profile data that is accumulating in greater and greater amounts. "If you can hit any data center and get the same information, that helps with real-time analytics," said Puentes, who also has implemented MapR's software in earlier positions at other companies.

Meanwhile, Project Myriad takes yet another tack in looking to improve Hadoop for data center uses, according to Jack Norris, MapR's chief marketing officer. 

The software, now in the early stages of development, will connect Hadoop's YARN cluster resource manager and job scheduler with Apache Mesos, an emerging management framework that enables multiple workloads to run side-by-side in a data center. The Myriad effort is the result of joint work by MapR, eBay and Mesosphere, which makes a distributed systems kernel based on Mesos.

Norris said Myriad will allow Hadoop users to run YARN -- an essential component of Hadoop 2 -- along with Mesos. That combination could be useful if Hadoop is to find broader application in mainstream computing uses.

Microsoft Azure leans toward machine learning

Azure Machine Learning is now generally available for Microsoft cloud computing users. It is billed as a managed cloud service for advanced analytics, and Microsoft expects its first uses to be found in website personalization applications and predictive maintenance efforts for industrial machinery.

An important part of the Azure Machine Learning release are new tools for data scientists and software developers, according to T.K. "Ranga" Rengarajan,  corporate vice president for data platforms at Microsoft.

"With this release, we are giving Python first-class treatment," Rengarajan said in an interview, noting the development language's growing popularity for analytics. He said Microsoft has fashioned many machine learning algorithms as part of its Bing search and Xbox game console initiatives -- they're now becoming available for the Python programmer community via Azure Machine Learning. 

Another language finding its way into the Azure Machine Learning toolbox is R, an open source programming language for advanced analytics applications. The addition follows Microsoft's January move to purchase Revolution Analytics, a provider of commercial software and services for R.

"We're seeing an emergence of a culture of data," Rengarajan said. "Often it is data that we used to throw out. Our approach is to make this very simple for people to compose machine learning applications. There are many personas involved in making these applications work, from data modelers and data scientists to DBAs."

The machine learning news, which was announced at the Strata + Hadoop World 2015 conference in San Jose, Calif., was accompanied by word of a preview of Microsoft's Azure HDInsight cloud-based Hadoop platform that will run on the Linux operating system in addition to Windows. The company also announced general availability of the Storm streaming data processor on HDInsight.

Microsoft's machine learning tools on Azure are part of a host of newly emerging cloud-based applications. IBM, Google and Amazon Web Services are also counted as top-tier players pursuing this still-infant software type.

Jack Vaughan is SearchDataManagement's news and site editor. Email him at jvaughan@techtarget.com, and follow us on Twitter: @sDataManagement.

Next Steps

Learn what you need to know to manage Hadoop projects

Listen to a leading analyst's take on the Hadoop space

Find out how Ancestry.com tapped Hadoop for machine learning

Dig Deeper on Hadoop framework

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Are you experiencing issues moving Hadoop to your data center?
Cancel
Yes. Hadoop has proven to be quite complex and expensive and to maintain since it requires a group of developers to maintain. Huge companies such as Google, LinkedIn and Facebook have already made the move. However, my organization lacks the capability and expertise to handle Hadoop. Sometimes the cases we have cannot be solved using Hadoop. In addition, Hadoop lacks key functionalities such as Joins and SQL that are used to manage and manipulate data.
Cancel
Finally MapR addresses data center replication! We've been calling for this for some time. So far our testing on the enhancement is very positive.
Cancel

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close