By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Pig enables developers to create query execution routines for analyzing large, distributed data sets without having to do low-level work in MapReduce, much like the way the Apache Hive data warehouse software provides a SQL-like interface for Hadoop that doesn't require direct MapReduce programming,
Apache Pig grew out of work at Yahoo Research and was first formally described in a paper published in 2008. Pig is intended to handle all kinds of data, including structured and unstructured information and relational and nested data. That omnivorous view of data likely had a hand in the decision to name the environment for the common barnyard animal. It also extends to Pig's take on application frameworks; while the technology is primarily associated with Hadoop, it is said to be capable of being used with other frameworks as well.
The underlying Hadoop framework grew out of large-scale Web applications whose architects chose non-SQL methods to economically collect and analyze massive amounts of data. It has lots of add-on help for handling big data applications because Apache Pig is just part of a long list of Hadoop ecosystem technologies that also includes Hive, HBase, ZooKeeper and other utilities intended to fill in functionality gaps in the framework.