Hadoop increasingly is becoming a key component of big data analytics applications, but it also has become much more than a single, uniform technology, according to SearchDataManagement News Editor Jack Vaughan, who covered the Hadoop Summit 2013 in San Jose, Calif. "The biggest thing I saw was that Apache Hadoop was moving beyond its original definition," Vaughan said in a podcast Q&A about the conference. "It is a lot about Java, and it is a lot about redoing data architectures as we have known them."
Vaughan said that with the development of related technologies such as Hive, Hadapt, Impala, Sqrrl, HBase, Knox and Pig, "Hadoop has spawned a gazillion open source tools that really make what you could call an ecosystem." And with a new resource management component called YARN set to supplement the Hadoop 2.0 release, users will be offered more flexibility on file systems and freedom from Hadoop's associated MapReduce programming framework. That's broadening the concept of what Hadoop is, Vaughan reports. "Just like Java came to stand for more than the language, I think Hadoop is coming to stand for a whole style of programming," he said.
As far as the enterprise story for Hadoop goes, Vaughan observed that a lot of people are hoping it eventually will be an alternative to the conventional data warehouse, at least in some cases. "We think it'll be a complement or a supplement, but sometimes it may be a replacement," he said. "It's fairly inexpensive to get going, but it is a very technical undertaking. So most of this stuff is still experimental."
In the 6-minute podcast, Vaughan further discussed his experiences at the Hadoop Summit 2013. Listeners will
- Learn what users can expect from YARN (also known as Yet Another Resource Negotiator).
- Get Vaughan's thoughts on the hallmarks of the emerging Hadoop programming style.
- Hear about potential enterprise applications for Hadoop.
- Discover the implications of using Hadoop to analyze "dark data."