Essential Guide

Get started Bring yourself up to speed with our introductory content.

Using big data platforms for data management, access and analytics

Big data environments often start with Hadoop, but most also include other data processing platforms. This guide rounds up a collection of content on big data platform options and how to manage deployments of them.


Big data platforms abound, which has upsides and downsides for prospective users. Hadoop clusters, the Spark processing engine, NoSQL databases, even conventional databases and data warehouses -- these and a variety of other technologies can all be tapped to create a big data architecture. But it's possible to go down the wrong technology path -- or multiple wrong paths.

It's up to IT managers, enterprise architects and others involved in building a big data framework to keep their organization on track to meet the business goals behind the deployment. "You need to make sure your architecture will take you where you want to go," said Ibrahim Itani, an independent consultant who focuses on big data analytics and a former leader of analytics and data warehousing teams at Verizon.

During a panel discussion at the 2017 TDWI Leadership Summit in Las Vegas, Itani compared architecting big data environments to designing bridges with multiple lanes and levels that can handle different traffic needs. In both cases, he said, you have to anticipate future usage so you can reconfigure or expand on top of the same foundations. Modifying a big data architecture "is costly and destructive to business operations if  major changes are needed very often," Itani cautioned. He added, though, that big data systems should be able to accommodate new platforms and tools as they emerge or as business needs change.

Edd Wilder-James, a consultant at Silicon Valley Data Science, also pointed to technology agility as a key element of well-designed big data architectures. In addition, he cited related attributes such as linear scale-out and rapid deployment capabilities, plus support for schema-on-read approaches to data modeling, which provide flexibility in how information is organized. "Not all data is equal," Wilder-James said, in a session at the TDWI conference. "We need to treat different data in different ways. The things we have to think about are much more complicated than before."

To help address such challenges, many organizations are deploying multiple big data platforms to handle different parts of the processing pipeline. This guide includes a wide range of content on the available platform options, including Hadoop, Spark and database technologies. In the sections below, you'll find guidance on navigating the technology selection process, real-world examples of big data programs and information on big data management trends and technology developments.

1Technology decisions-

Insight on choosing the right big data platforms

Hadoop once seemed to be synonymous with big data, and it's still a key part of most big data architectures. But the big data technology landscape has broadened to include other platforms that are augmenting Hadoop in user deployments -- or, in some cases, replacing it altogether. The increased menu of technology choices gives organizations more flexibility for meeting their application needs; it also expands on Hadoop's original batch processing focus to enable stream processing and real-time analytics.

The articles in this section highlight various big data platforms and provide advice on what they're suited for and how to use them effectively.


IT teams face pressing need for streaming analytics platforms

Building an architecture to support real-time analytics applications is becoming a priority for many organizations. But there's a plethora of data streaming platforms to consider. Continue Reading


Big data tools, databases often best used in mixed company

EMA analyst John Myers says that, when evaluating data management technologies, IT teams should look to mix and match processing platforms for their big data workloads. Continue Reading


Companies make Spark a centerpiece in big data environments

The Apache Spark processing engine has pushed its way into the big data spotlight alongside Hadoop, and users are turning to Spark for more than just its batch processing speed. Continue Reading


For big data management needs, NoSQL software may be the answer

This handbook examines the potential role of NoSQL databases in big data applications -- and functionality issues that must be addressed when considering a deployment. Continue Reading


Changing data landscape fuels new approaches to data management

The growing adoption of big data systems is driving changes in how data architectures are designed, as well as how data management processes are organized and implemented in organizations. Continue Reading


Hybrid architectures tie data lakes, warehouses together

Increasingly, data lakes and data warehouses are coexisting in big data architectures, a combination that has implications for data modeling and other data management practices. Continue Reading


Future of big data likely to go beyond Hadoop's core components

Doug Cutting, co-creator of Hadoop, says that the original core pieces of the distributed processing framework may not be at the center of big data systems in the future. Continue Reading


Different paths to take in building big data systems

Author Dale Neef explains three different approaches to deploying a big data system, as well as how to integrate it with existing IT systems, in an excerpt from a book on managing big data. Continue Reading

2User examples-

Big data platforms and management strategies in action

Like other IT projects, big data applications face a host of hurdles -- only writ larger, in most cases. That starts with planning, designing and building a big data architecture, then continues on to things such as configuring and partitioning data sets, deploying advanced analytics tools, governing data and managing the use of Hadoop clusters and other big data platforms.

The stories in this section provide a window into big data projects at numerous user organizations, with tips from experienced IT managers and other users on tactics and strategies they've used in their deployments.


Rise of big data platforms spurs new look at data governance process

Big data systems pose new data governance challenges in organizations. But some are navigating their way through the changes as they move to govern their data lakes effectively. Continue Reading


Big data analytics initiatives find value in a variety of tools

Getting full business value from big data systems often takes a mix of predictive modeling, machine learning and other advanced analytics applications -- and a lot of effort. Continue Reading


Big data architecture development doesn't happen overnight

Although Hadoop and related technologies enable organizations to design big data environments that are a match for their needs, putting all the pieces together isn't an easy feat. Continue Reading


Spark usage on the rise despite gaps in its functionality

Spark still has some growing to do, but that isn't stopping an increasing number of organizations from deploying the technology to boost their big data processing performance. Continue Reading


Real-time data streaming platforms speed up big data analytics

Companies are using real-time data processing and analytics technologies to find information in streams of big data that can help their business operations take action fast. Continue Reading


User priority: Finding the business benefits of Hadoop platforms

More IT managers are looking to deploy Hadoop clusters in their organizations, but first, they have to sell business executives on the value of big data analytics applications. Continue Reading


Monitoring, governance of Hadoop platforms key to big data success

Hadoop is playing a more central role in business operations, which has made managing the distributed processing framework a big priority for IT vendors and big data users alike. Continue Reading


New users face learning curve for managing big data platforms

IT and analytics teams have to learn their way with system configuration, data partitioning and other setup processes to optimize the performance of Hadoop and Spark systems. Continue Reading

3News and trends-

Technology developments on big data platforms

Things move quickly in the big data ecosystem, partly because of the open source nature of Hadoop, Spark and other technologies. In addition, many big data platforms and tools are still relatively new, so they get updated with new functionality on a regular basis. The growth of cloud computing and the emergence of technologies such as containers and microservices are also driving changes to big data software and systems.

The stories in this section examine trends affecting big data vendors and users; they also shine a light on new technologies that have been added to the big data mix.


Drizzle software pegged to perk up Spark's streaming throughput

Spark's lead developers are looking to gain a performance edge on rival open source stream processing platforms via the addition of a low-latency execution engine called Drizzle. Continue Reading


Real-time processing pipelines bring changes to big data systems

Big data architectures are changing to support the move to real-time processing and faster data analytics, with microservices gaining prominence in the Hadoop development domain. Continue Reading


New components in Hadoop platforms include containers, microservices

In big data environments, microservices running in containers can break processing and analytics jobs up into pieces, easing development and management of Hadoop data flows. Continue Reading


Hadoop vendors look to ease cost, complexity of cloud-based clusters

Big data vendors are moving to simplify the process of running Hadoop platforms in the cloud, partly through metered pricing that lets users set up transient clusters as needed. Continue Reading


Together, IoT and big data increase data management needs

Consultant Andy Hayler says organizations looking to handle the large volumes of data coming from the internet of things may need to start by deploying new big data platforms. Continue Reading


New data management tools lean on graph database technology

Graph databases are increasingly being tapped to help power a variety of new application architectures, including data integration, data governance and master data management tools. Continue Reading


Data modeling techniques must evolve to accommodate big data

The surging adoption of big data platforms is pushing IT teams to adjust the way they approach data modeling, including the process of creating database schemas. Continue Reading


GPUs find a place in graph database, machine learning systems

Familiar to gamers and supercomputer programmers, graphics processing units are now being tapped to power big data systems running graph databases and machine learning applications. Continue Reading

4Big data glossary-

Terms to know related to big data platforms

Read the definitions included in this section to learn the basics about big data and the key technologies for processing, managing and analyzing it.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.