This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
5. - Glossary of Hadoop-related terminology: Read more in this section
Explore other sections in this guide:
- 1. - Elucidating benefits, myths and facts about Hadoop
- 2. - Keeping up with Hadoop news and trends
- 3. - Examining issues and weaknesses in the Hadoop ecosystem
Pig enables developers to create query execution routines for analyzing large, distributed data sets without having to do low-level work in MapReduce, much like the way the Apache Hive data warehouse software provides a SQL-like interface for Hadoop that doesn't require direct MapReduce programming,
Apache Pig grew out of work at Yahoo Research and was first formally described in a paper published in 2008. Pig is intended to handle all kinds of data, including structured and unstructured information and relational and nested data. That omnivorous view of data likely had a hand in the decision to name the environment for the common barnyard animal. It also extends to Pig's take on application frameworks; while the technology is primarily associated with Hadoop, it is said to be capable of being used with other frameworks as well.
The underlying Hadoop framework grew out of large-scale Web applications whose architects chose non-SQL methods to economically collect and analyze massive amounts of data. It has lots of add-on help for handling big data applications because Apache Pig is just part of a long list of Hadoop ecosystem technologies that also includes Hive, HBase, ZooKeeper and other utilities intended to fill in functionality gaps in the framework.