ATLANTA -- The much-hyped concept of "big data" and all the tools that fall under the big data heading got seriously...
roasted Monday night at the Enterprise Data World conference.
The humorous verbal shellacking of big data -- which generated plenty of audience laughs -- came at the hands of Karen Lopez, a senior project manager and principal consultant with InfoAdvisors Inc. Lopez is also proprietor of the popular Datachick Twitter feed and often uses that outlet to post admittedly snarky comments about the world of information management.
The project is late because the elephant driver needs to be tuned to work with Pig and Zookeeper.
Karen Lopez, senior project manager and principal consultant, InfoAdvisors Inc.
"Let's start with the basics: What is big data?" Lopez said. "I'm here to tell you that nobody really knows."
Lopez pointed to Wikipedia, which says that big data consists of data sets "that grow so large they become awkward."
"What the heck kind of definition is that?" she asked.
The consultant then turned her attention to Apache Hadoop and other technologies that make up the big data universe.
"The great thing about Hadoop is that everything that comes with it is called Hadoop," she said. "We have Hadoop Common, Hadoop Distributed File System, Hadoop MapReduce and what do you think that elephant mascot is named? Not Harry or Harvey, but Hadoop."
Lopez did list several tools with names that do not begin with Hadoop. The list included Hive, Pig, Zookeeper and Mahout, which is named after the Hindi word for elephant driver.
"Remember when technologies had names you could actually use in front of business people?" she asked. "The project is late because the elephant driver needs to be tuned to work with Pig and Zookeeper."
Lopez also had something to say about the unpredictability of big data.
"You don't know ahead of time what kind of data you're going to get and so it's schema-less," she said. "The problem is that you often don't know what the design should be until the data arrives, so that means you need to data model by sprint -- and by sprint I mean at the speed of light. I hope you're in training right now."
Find out what happened at previous Enterprise Data World conferences
Enterprise Data World attendees spotlight data management issues
Data governance best practices revealed at Enterprise Data World
Process data management demystified at Enterprise Data World
But it wasn't all jokes for the Datachick. At the end of the five-minute rant, the consultant explained why she wanted to take big data down a peg or two. It was her way of making the point that, despite the hype, big data technologies are just one category of tools and are by no means a cure-all.
"There really is no reason to think that big data is in conflict or in competition with relational technologies or any of the other database technologies that we've learned to love," Lopez said. "We just need the right tool for the right job and size doesn't matter when finding that right tool."
Later in the evening, two conference speakers took issue with Lopez's comments. The first was Geoffrey P. Malafsky, Ph.D., the CEO of Phasic Systems Inc., a provider of "agile data governance" products and services.
"Some of the prior speakers I disagree with pretty strongly," he said. "Relational database technology is pretty much obsolete and archaic. There is a reason why big data is here. There is a reason that the almost useless word 'schema-less' exists. There is no such thing as schema-less data. You may have not written one down or put it into ERwin, but as soon the data is stored someplace it has a schema."
A far less serious rebuff came from Amir Halfon, Oracle Corp.'s senior director of financial services technology, who also spent some time talking about Hadoop.
"Unlike the Datachick, I won't deride the fact that everything has the word 'Hadoop' in front of it," Halfon said. The creator of Hadoop "invented so many things that I think you've got to give him [a break] if he ran out of names."