Using big data platforms for data management, access and analytics

Last updated:February 2017

Editor's note

Big data platforms abound, which has upsides and downsides for prospective users. Hadoop clusters, the Spark processing engine, NoSQL databases, even conventional databases and data warehouses -- these and a variety of other technologies can all be tapped to create a big data architecture. But it's possible to go down the wrong technology path -- or multiple wrong paths.

It's up to IT managers, enterprise architects and others involved in building a big data framework to keep their organization on track to meet the business goals behind the deployment. "You need to make sure your architecture will take you where you want to go," said Ibrahim Itani, an independent consultant who focuses on big data analytics and a former leader of analytics and data warehousing teams at Verizon.

During a panel discussion at the 2017 TDWI Leadership Summit in Las Vegas, Itani compared architecting big data environments to designing bridges with multiple lanes and levels that can handle different traffic needs. In both cases, he said, you have to anticipate future usage so you can reconfigure or expand on top of the same foundations. Modifying a big data architecture "is costly and destructive to business operations if  major changes are needed very often," Itani cautioned. He added, though, that big data systems should be able to accommodate new platforms and tools as they emerge or as business needs change.

Edd Wilder-James, a consultant at Silicon Valley Data Science, also pointed to technology agility as a key element of well-designed big data architectures. In addition, he cited related attributes such as linear scale-out and rapid deployment capabilities, plus support for schema-on-read approaches to data modeling, which provide flexibility in how information is organized. "Not all data is equal," Wilder-James said, in a session at the TDWI conference. "We need to treat different data in different ways. The things we have to think about are much more complicated than before."

To help address such challenges, many organizations are deploying multiple big data platforms to handle different parts of the processing pipeline. This guide includes a wide range of content on the available platform options, including Hadoop, Spark and database technologies. In the sections below, you'll find guidance on navigating the technology selection process, real-world examples of big data programs and information on big data management trends and technology developments.

1Big data platforms and management strategies in action

Like other IT projects, big data applications face a host of hurdles -- only writ larger, in most cases. That starts with planning, designing and building a big data architecture, then continues on to things such as configuring and partitioning data sets, deploying advanced analytics tools, governing data and managing the use of Hadoop clusters and other big data platforms.

The stories in this section provide a window into big data projects at numerous user organizations, with tips from experienced IT managers and other users on tactics and strategies they've used in their deployments.

2Technology developments on big data platforms

Things move quickly in the big data ecosystem, partly because of the open source nature of Hadoop, Spark and other technologies. In addition, many big data platforms and tools are still relatively new, so they get updated with new functionality on a regular basis. The growth of cloud computing and the emergence of technologies such as containers and microservices are also driving changes to big data software and systems.

The stories in this section examine trends affecting big data vendors and users; they also shine a light on new technologies that have been added to the big data mix.