BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Pig enables developers to create query execution routines for analyzing large, distributed data sets without having to do low-level work in MapReduce, much like the way the Apache Hive data warehouse software provides a SQL-like interface for Hadoop that doesn't require direct MapReduce programming,
Apache Pig grew out of work at Yahoo Research and was first formally described in a paper published in 2008. Pig is intended to handle all kinds of data, including structured and unstructured information and relational and nested data. That omnivorous view of data likely had a hand in the decision to name the environment for the common barnyard animal. It also extends to Pig's take on application frameworks; while the technology is primarily associated with Hadoop, it is said to be capable of being used with other frameworks as well.
The underlying Hadoop framework grew out of large-scale Web applications whose architects chose non-SQL methods to economically collect and analyze massive amounts of data. It has lots of add-on help for handling big data applications because Apache Pig is just part of a long list of Hadoop ecosystem technologies that also includes Hive, HBase, ZooKeeper and other utilities intended to fill in functionality gaps in the framework.
Continue Reading About Apache Pig
- For SAP Customers, The IBM Cloud is the Trusted Path to AI and IoT –IBM
- Moving to PaaS: Security Options to Look For in a Public Kubernetes Service –IBM
- See More
- Tackling the Most Common Challenges with Big Data Integration –Information Builders
- Bringing the Power of SAS to Hadoop –SAS