Emerging database technologies: How Hadoop and MapReduce compare

I’m hearing a lot about Hadoop and MapReduce, but I’m still a little unclear as to how those two emerging database technologies relate to each other. Can you clarify?

    Requires Free Membership to View

    When you register, you'll begin receiving targeted emails from my team of award-winning writers. Our goal is to keep you informed on the hottest data and information management trends today.

    Hannah Smalltree, Editorial Director

    By submitting your registration information to SearchDataManagement.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataManagement.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

Hadoop is a framework for distributed data and computing. In other words, it’s excellent for storing large sets of semi-structured data. (Whether a collection of semi-structured data can truly be considered to be a “set” is an interesting question, but you can probably guess what I mean). The data can be stored redundantly, so the failure of one disk doesn’t result in data loss. Hadoop is also very good at distributed computing – processing large sets of data rapidly across multiple machines.

MapReduce is a programming model for processing large sets of semi-structured data. What is a programming model, you ask? It’s a way of approaching and solving a given problem. For example, in a relational database, we perform queries using a set-based language – i.e., SQL. We tell the language the result we want and leave it to the system to work out how to produce it. With a more traditional language (C++, Java), we tend to spell out, step by step, how to solve the problem. Those are two different programming models. MapReduce is yet another.

MapReduce and Hadoop are independent of each other but, in practice, work well together – hence we often find them mentioned in the same breath.

This was first published in May 2011