This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
1. - Understanding the basics of Hadoop technology: Read more in this section
- Considerations for deploying Hadoop technology
- Debunking common Hadoop myths
- Comparing MapReduce and Hadoop technologies
Explore other sections in this guide:
Hadoop is a framework for distributed data and computing. In other words, it’s excellent for storing large sets of semi-structured data. (Whether a collection of semi-structured data can truly be considered to be a “set” is an interesting question, but you can probably guess what I mean). The data can be stored redundantly, so the failure of one disk doesn’t result in data loss. Hadoop is also very good at distributed computing – processing large sets of data rapidly across multiple machines.
MapReduce is a programming model for processing large sets of semi-structured data. What is a programming model, you ask? It’s a way of approaching and solving a given problem. For example, in a relational database, we perform queries using a set-based language – i.e., SQL. We tell the language the result we want and leave it to the system to work out how to produce it. With a more traditional language (C++, Java), we tend to spell out, step by step, how to solve the problem. Those are two different programming models. MapReduce is yet another.
MapReduce and Hadoop are independent of each other but, in practice, work well together – hence we often find them mentioned in the same breath.