Big data is proving its value to organizations of all types and sizes, and in a wide range of industries. Enterprises that make advanced use of big data are realizing tangible benefits, from improved efficiency in operations and increased visibility into rapidly changing environments to the optimization of products and services for customers.
The result is that as organizations find uses for these large stores of data, big data technology, practices and approaches are evolving. New techniques for collecting, processing, managing and analyzing the gamut of data across an organization continue to emerge.
Dealing with big data is more than just dealing with large volumes of stored information. Volume is just one of the many "Vs" that organizations need to address when dealing with big data. There is also a significant variety of data -- from structured information sitting in distributed databases throughout the organization to vast quantities of unstructured information residing in files, images, videos, sensor data, text files, documents and even paper documents that are waiting to be digitized. In addition, this information often is created and changed at a rapid rate (velocity) and is of varying levels of quality (veracity), resulting in challenges in managing, querying, processing and analyzing all the disparate data.
Four major trends in big data are helping organizations meets those challenges.
1. More data, data diversity drive advances in processing and rise of edge computing
It may come as little surprise that the pace of data generation continues to accelerate. In the financial services industry alone, the amount of data generated each second will grow by over 700% in 2021.
This article is part of
Much of this data is not generated from the transactions that happen in databases, but comes from other sources, including cloud systems, smart devices such as smartphones and voice assistants, and video streaming. This data is largely unstructured and in the past was left mostly unprocessed by organizations. In fact, upwards of 90% of an organization's unstructured data goes unprocessed, according to analyst firm IDC.
Which brings us to the biggest trend in big data: Non-database sources will continue to be the dominant generators of data, in turn forcing organizations to reexamine their needs for data processing. Voice assistants and IoT devices, in particular, are seeing rapid ramp-up in big data handling needs across industries as diverse as retail, healthcare, finance, insurance, manufacturing, energy and in a wide range of public and private sector markets. This explosion in data diversity is compelling organizations to think beyond the traditional data warehouse as means for processing all this information.
In addition, the need for handling all this information is moving to the devices themselves, as industry breakthroughs in processing power have led to the development of increasingly advanced devices capable of collecting and storing data on their own without taxing network, storage and computing infrastructure. For example, mobile banking apps are able to handle many tasks for remote check deposit and processing without having to send images back and forth to central banking systems for processing.
The use of devices for distributed processing is embodied in the concept of edge computing, which shifts the processing load to the devices themselves before the data is sent to the servers. Edge computing optimizes performance and storage by reducing the need for data flowing through networks, reducing computing and processing costs, especially cloud storage, bandwidth and processing expenses. Edge computing helps to speed up data analysis and provides faster responses to the user.
In the healthcare sector, the rapidly expanding market of wearables -- such as Fitbit, Apple Watch and Google Android-powered devices -- is driving growth in telemedicine and allowing healthcare providers to gather critical patient data in real time. The results are used for a wide range of big data-based processing applications to improve patient outcomes.
2. Big data storage needs spur innovations in cloud and hybrid cloud, growth of data lakes
To deal with the inexorable increase in data generation, organizations are spending more of their resources storing this data in a range of cloud-based and hybrid systems optimized for all the "Vs" of big data. In previous decades, organizations handled their own storage infrastructure, resulting in massive data centers that enterprises had to manage, secure and guarantee continued operation. The shift to cloud computing changed that dynamic. By shifting the responsibility to cloud infrastructure providers -- such as Amazon, Google, Microsoft, IBM and others -- organizations can deal with almost limitless amounts of new data and pay for storage and compute capability on demand without having to maintain their own large and complex data centers.
Some industries are challenged in their use of cloud infrastructure due to regulatory or technical limitations. Heavily regulated industries -- such as healthcare, financial services and government -- have restrictions that prevent the use of public cloud infrastructure. As such, in the past decade, cloud providers have developed ways to provide more regulatory-friendly infrastructure as well as hybrid approaches that combine aspects of third-party cloud systems with on-premises computing and storage to meet critical infrastructure needs. The evolution of both public cloud and hybrid cloud infrastructures will no doubt progress as organizations seek the economic and technical advantages of cloud computing.
In addition to innovations in cloud storage and processing, enterprises are shifting toward new data architecture approaches that allow them to handle the variety, veracity and volume challenges of big data. Rather than trying to centralize data storage in a data warehouse that requires complex and time-intensive data extraction, transformation and loading, enterprises are evolving the concept of the data lake. Data lakes store structured and unstructured data in their native format. This approach shifts the responsibility of transformation and processing to end points that have different data needs. The data lake can also provide shared services for data analysis and processing.
3. Adoption of advanced analytics, ML and other AI technologies increases dramatically
Enterprises are realizing improvements in big data analytics and processing. With the vast amount of data being generated, traditional analytics approaches are challenged because they're not easily automated for data analysis at scale. Distributed processing technologies, especially those promoted by open source platforms such as Hadoop and Spark, enable organizations to process petabytes of information at rapid speed. Machine learning and AI systems allow them to more easily spot patterns, detect anomalies and make predictions than they could before. Enterprises are using big data analytics technologies to optimize their business intelligence and analytics initiatives, moving past slow reporting tools dependent on data warehouse technology to more intelligent, responsive applications that enable greater visibility into customer behavior, business processes and overall operations.
No technology has been as revolutionary to big data analytics as machine learning and AI. AI is used by organizations of all sizes to optimize and improve their business processes. Machine learning harnesses the power of systems to identify patterns in data to provide advanced predictive analytics and capabilities across unstructured data handling. This includes the ability to provide recognition systems for image, video and text data; automated classification of information; natural language processing capabilities for chatbots and voice and text analysis; autonomous business process automation; pattern and anomaly detection; high degrees of personalization and recommendation; and systems that can find optimal solutions among the sea of data.
Indeed, with the help of AI and machine learning, companies are using their big data to provide deeper customer support through intelligent chatbots and more personalized interactions without requiring significant increases in customer support staff. These AI-enabled systems are able to collect and analyze vast amounts of information about their customers and users, especially when paired with a data lake strategy that can aggregate a wide range of information across many sources.
Enterprises are also seeing innovations in the area of data visualization. People understand the meaning of data when it's represented in a visual form, such as charts, graphs and plots. Emerging forms of data visualization are putting the power of AI-enabled analytics and powerful insights into the hands of even casual business users. This helps organizations spot key insights that can improve decision-making. Companies are discovering the value of data-driven decision-making and the power of data in the organization. Advanced forms of visualization and analytics tools even let users ask questions in natural language, with the system automatically determining the right query and showing the results in a context-relevant manner.
4. DataOps, data stewardship move to the fore
Many aspects of big data processing, storage and management will see continued evolution for years to come. In part, much of this innovation is driven by technology needs, but also by changes in the way we think and relate to data.
One area of innovation is the emergence of DataOps, a methodology and practice that focuses on agile, iterative approaches for dealing with the full lifecycle of data as it flows through the organization. Rather than thinking about data in piecemeal fashion with separate people dealing with data generation, storage, transportation, processing and management, DataOps addresses organizational needs across the data lifecycle from generation to archiving.
Likewise, organizations are dealing increasingly with data governance, privacy and security issues. In the past, enterprises were somewhat lax with their concerns around data privacy and governance, but regulations and users are making organizations much more liable for what happens to information. Due to widespread security breaches, eroding customer trust in enterprise data-sharing practices, and challenges in managing data over its lifecycle, organizations are becoming much more involved in data stewardship, working harder to secure and manage data especially as it crosses international boundaries. New tools are emerging to make sure that data stays where it needs to stay, is secured at rest and in movement, and is appropriately tracked over its lifecycle.
These big data trends make working in the big data space an exciting place to be in 2021 and no doubt through the foreseeable future.