Big data platforms abound, which has upsides and downsides for prospective users. Hadoop clusters, the Spark processing engine, NoSQL databases, even conventional databases and data warehouses -- these and a variety of other technologies can all be tapped to create a big data architecture. But it's possible to go down the wrong technology path -- or multiple wrong paths.
It's up to IT managers, enterprise architects and others involved in building a big data framework to keep their organization on track to meet the business goals behind the deployment. "You need to make sure your architecture will take you where you want to go," said Ibrahim Itani, an independent consultant who focuses on big data analytics and a former leader of analytics and data warehousing teams at Verizon.
During a panel discussion at the 2017 TDWI Leadership Summit in Las Vegas, Itani compared architecting big data environments to designing bridges with multiple lanes and levels that can handle different traffic needs. In both cases, he said, you have to anticipate future usage so you can reconfigure or expand on top of the same foundations. Modifying a big data architecture "is costly and destructive to business operations if major changes are needed very often," Itani cautioned. He added, though, that big data systems should be able to accommodate new platforms and tools as they emerge or as business needs change.
Edd Wilder-James, a consultant at Silicon Valley Data Science, also pointed to technology agility as a key element of well-designed big data architectures. In addition, he cited related attributes such as linear scale-out and rapid deployment capabilities, plus support for schema-on-read approaches to data modeling, which provide flexibility in how information is organized. "Not all data is equal," Wilder-James said, in a session at the TDWI conference. "We need to treat different data in different ways. The things we have to think about are much more complicated than before."
To help address such challenges, many organizations are deploying multiple big data platforms to handle different parts of the processing pipeline. This guide includes a wide range of content on the available platform options, including Hadoop, Spark and database technologies. In the sections below, you'll find guidance on navigating the technology selection process, real-world examples of big data programs and information on big data management trends and technology developments.
Insight on choosing the right big data platforms
Hadoop once seemed to be synonymous with big data, and it's still a key part of most big data architectures. But the big data technology landscape has broadened to include other platforms that are augmenting Hadoop in user deployments -- or, in some cases, replacing it altogether. The increased menu of technology choices gives organizations more flexibility for meeting their application needs; it also expands on Hadoop's original batch processing focus to enable stream processing and real-time analytics.
The articles in this section highlight various big data platforms and provide advice on what they're suited for and how to use them effectively.
Building an architecture to support real-time analytics applications is becoming a priority for many organizations. But there's a plethora of data streaming platforms to consider. Continue Reading
EMA analyst John Myers says that, when evaluating data management technologies, IT teams should look to mix and match processing platforms for their big data workloads. Continue Reading
The Apache Spark processing engine has pushed its way into the big data spotlight alongside Hadoop, and users are turning to Spark for more than just its batch processing speed. Continue Reading
This handbook examines the potential role of NoSQL databases in big data applications -- and functionality issues that must be addressed when considering a deployment. Continue Reading
The growing adoption of big data systems is driving changes in how data architectures are designed, as well as how data management processes are organized and implemented in organizations. Continue Reading
Increasingly, data lakes and data warehouses are coexisting in big data architectures, a combination that has implications for data modeling and other data management practices. Continue Reading
Doug Cutting, co-creator of Hadoop, says that the original core pieces of the distributed processing framework may not be at the center of big data systems in the future. Continue Reading
Author Dale Neef explains three different approaches to deploying a big data system, as well as how to integrate it with existing IT systems, in an excerpt from a book on managing big data. Continue Reading
Big data platforms and management strategies in action
Like other IT projects, big data applications face a host of hurdles -- only writ larger, in most cases. That starts with planning, designing and building a big data architecture, then continues on to things such as configuring and partitioning data sets, deploying advanced analytics tools, governing data and managing the use of Hadoop clusters and other big data platforms.
The stories in this section provide a window into big data projects at numerous user organizations, with tips from experienced IT managers and other users on tactics and strategies they've used in their deployments.
Big data systems pose new data governance challenges in organizations. But some are navigating their way through the changes as they move to govern their data lakes effectively. Continue Reading
Getting full business value from big data systems often takes a mix of predictive modeling, machine learning and other advanced analytics applications -- and a lot of effort. Continue Reading
Although Hadoop and related technologies enable organizations to design big data environments that are a match for their needs, putting all the pieces together isn't an easy feat. Continue Reading
Spark still has some growing to do, but that isn't stopping an increasing number of organizations from deploying the technology to boost their big data processing performance. Continue Reading
Companies are using real-time data processing and analytics technologies to find information in streams of big data that can help their business operations take action fast. Continue Reading
More IT managers are looking to deploy Hadoop clusters in their organizations, but first, they have to sell business executives on the value of big data analytics applications. Continue Reading
Hadoop is playing a more central role in business operations, which has made managing the distributed processing framework a big priority for IT vendors and big data users alike. Continue Reading
IT and analytics teams have to learn their way with system configuration, data partitioning and other setup processes to optimize the performance of Hadoop and Spark systems. Continue Reading
3News and trends-
Technology developments on big data platforms
Things move quickly in the big data ecosystem, partly because of the open source nature of Hadoop, Spark and other technologies. In addition, many big data platforms and tools are still relatively new, so they get updated with new functionality on a regular basis. The growth of cloud computing and the emergence of technologies such as containers and microservices are also driving changes to big data software and systems.
The stories in this section examine trends affecting big data vendors and users; they also shine a light on new technologies that have been added to the big data mix.
Spark's lead developers are looking to gain a performance edge on rival open source stream processing platforms via the addition of a low-latency execution engine called Drizzle. Continue Reading
Big data architectures are changing to support the move to real-time processing and faster data analytics, with microservices gaining prominence in the Hadoop development domain. Continue Reading
In big data environments, microservices running in containers can break processing and analytics jobs up into pieces, easing development and management of Hadoop data flows. Continue Reading
Big data vendors are moving to simplify the process of running Hadoop platforms in the cloud, partly through metered pricing that lets users set up transient clusters as needed. Continue Reading
Consultant Andy Hayler says organizations looking to handle the large volumes of data coming from the internet of things may need to start by deploying new big data platforms. Continue Reading
Graph databases are increasingly being tapped to help power a variety of new application architectures, including data integration, data governance and master data management tools. Continue Reading
The surging adoption of big data platforms is pushing IT teams to adjust the way they approach data modeling, including the process of creating database schemas. Continue Reading
Familiar to gamers and supercomputer programmers, graphics processing units are now being tapped to power big data systems running graph databases and machine learning applications. Continue Reading
4Big data glossary-
Terms to know related to big data platforms
Read the definitions included in this section to learn the basics about big data and the key technologies for processing, managing and analyzing it.
- Apache Hadoop YARN
- Apache Spark
- big data
- big data analytics
- big data as a service (BDaaS)
- big data management
- data engineer
- data scientist
- database management system (DBMS)