Kesu - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Spark and S3 storage carry forward NBC big data initiative

Listen to this podcast

Has the Hadoop elephant left the room? At NBC, ad analytics have evolved in Hadoop style, but with Spark and S3 at the core, as discussed at the Big Data Innovation Summit in Boston.

The data strategy behind NBCUniversal's Advanced Advertising initiatives, including moves to employ Apache Spark and Amazon Simple Storage Service (S3), was a matter of discussion at the recent Big Data Innovation Summit in Boston.

In a session called "The elephant not in the room. Where did Hadoop go? The Data Strategy behind NBCUniversal's Advanced Advertising initiatives," Jeffrey Pinard, vice president for data technology and engineering at NBC, described the evolution of Hadoop-style development for NBC analytics efforts in the face of industry disruption.

He told of how Spark and S3 storage took on tasks formerly relegated to older Hadoop components, as part of an effort to digitally transform the peacock network's advertising analytics.

Pinard's team foresaw big scale-out issues as they grappled with the large amounts of mostly unstructured data that web viewers generate. Going to a cloud data lake architecture made sense, as the need for scaling was virtually boundless, he said.

Pinard's career has included extensive work for digital advertising agencies with on-premises data centers built for Hadoop. But, at NBC, Pinard and his team looked for something other than classic Hadoop -- that is, MapReduce and Hadoop Distributed File System (HDFS) -- in pursuit of a data lake that would underpin a new analytical portfolio. Apache Spark took the place that classic Hadoop would have assigned to MapReduce. S3 storage took on the role traditionally played by HDFS.

Boundless computation is not all fun and games. That's one takeaway from the Big Data Innovation Summit.

Teams can opt for the cloud to gain access to unlimited resources, but, as Pinard put it, they also need to "get ready for unlimited checks" as part of a big data initiative. In some part, that is because HDFS on a large scale can entail -- again, in Pinard's words -- masses of "spinning disks." While such spinning disks were less expensive than alternatives in the early days of Hadoop, S3 storage today is cheaper still.

In this podcast, Talking Data team moderators discuss NBCUniversal's big data initiative and related matters. Listen to and subscribe to this and other editions of the Talking Data podcast and find out more about digital disruption and strategies employed as part of this big data initiative in broadcasting analytics.

Next Steps

Learn how to manage Hadoop projects

Find out how Athena works with S3

Listen to a podcast concerning Hadoop's future