Managing Hadoop projects: What you need to know to succeed
A comprehensive collection of articles, videos and more, hand-picked by our editors
Pig enables developers to create query execution routines for analyzing large, distributed data sets without having to do low-level work in MapReduce, much like the way the Apache Hive data warehouse software provides a SQL-like interface for Hadoop that doesn't require direct MapReduce programming,
Apache Pig grew out of work at Yahoo Research and was first formally described in a paper published in 2008. Pig is intended to handle all kinds of data, including structured and unstructured information and relational and nested data. That omnivorous view of data likely had a hand in the decision to name the environment for the common barnyard animal. It also extends to Pig's take on application frameworks; while the technology is primarily associated with Hadoop, it is said to be capable of being used with other frameworks as well.
The underlying Hadoop framework grew out of large-scale Web applications whose architects chose non-SQL methods to economically collect and analyze massive amounts of data. It has lots of add-on help for handling big data applications because Apache Pig is just part of a long list of Hadoop ecosystem technologies that also includes Hive, HBase, ZooKeeper and other utilities intended to fill in functionality gaps in the framework.