Big data is proving its value to organizations of all types and sizes, and in a wide range of industries. Enterprises that make advanced use of big data are realizing tangible business benefits, from improved efficiency in operations and increased visibility into rapidly changing environments to the optimization of products and services for customers.
The result is that as organizations find uses for these large stores of data, big data technologies, practices and approaches are evolving. New techniques and architectures for collecting, processing, managing and analyzing the gamut of data across an organization continue to emerge.
Dealing with big data is more than just dealing with large volumes of stored information. Volume is just one of the many "V's" of big data that organizations need to address. There usually is also a significant variety of data -- from structured information sitting in databases distributed throughout the organization to vast quantities of unstructured and semistructured data residing in files, images, videos, sensors, system logs, text and documents, including paper ones that are waiting to be digitized. In addition, this information often is created and changed at a rapid rate (velocity) and has varying levels of data quality (veracity), creating further challenges on data management, processing and analysis.
Four major trends in big data are helping organizations meet those challenges.
1. More data, increased data diversity drive advances in processing and the rise of edge computing
It may come as little surprise that the pace of data generation continues to accelerate. In the financial services industry alone, the amount of data generated each second will grow by over 700% in 2021.
Much of this data is not generated from the transactions that happen in databases, but comes from other sources, including cloud systems, smart devices such as smartphones and voice assistants, and video streaming. This data is largely unstructured and in the past was left mostly unprocessed by organizations. In fact, upwards of 90% of an organization's unstructured data goes unprocessed, according to analyst firm IDC.
Which brings us to the biggest trend in big data: Non-database sources will continue to be the dominant generators of data, in turn forcing organizations to reexamine their needs for data processing. Voice assistants and IoT devices, in particular, are driving a rapid ramp-up in big data management needs across industries as diverse as retail, healthcare, finance, insurance, manufacturing and energy and in a wide range of public-sector markets. This explosion in data diversity is compelling organizations to think beyond the traditional data warehouse as a means for processing all this information.
In addition, the need to handle the data being generated is moving to the devices themselves, as industry breakthroughs in processing power have led to the development of increasingly advanced devices capable of collecting and storing data on their own without taxing network, storage and computing infrastructure. For example, mobile banking apps are able to handle many tasks for remote check deposit and processing without having to send images back and forth to central banking systems for processing.
The use of devices for distributed processing is embodied in the concept of edge computing, which shifts the processing load to the devices themselves before the data is sent to the servers. Edge computing optimizes performance and storage by reducing the need for data to flow through networks, reducing computing and processing costs, especially cloud storage, bandwidth and processing expenses. Edge computing helps to speed up data analysis and provides faster responses to the user.
In the healthcare sector, for example, the rapidly expanding market of wearables -- such as Fitbit, Apple Watch and Google Android devices -- is driving growth in telemedicine and allowing healthcare providers to gather critical patient data in real time. The results are used for a wide range of big data processing and analytics applications designed to improve patient outcomes.
2. Big data storage needs spur innovations in cloud and hybrid cloud platforms, growth of data lakes
To deal with the inexorable increase in data generation, organizations are spending more of their resources storing this data in a range of cloud-based and hybrid cloud systems optimized for all the V's of big data. In previous decades, organizations handled their own storage infrastructure, resulting in massive data centers that enterprises had to manage, secure and operate. The move to cloud computing changed that dynamic. By shifting the responsibility to cloud infrastructure providers -- such as AWS, Google, Microsoft and IBM -- organizations can deal with almost limitless amounts of new data and pay for storage and compute capability on demand without having to maintain their own large and complex data centers.
Some industries are challenged in their use of cloud infrastructure due to regulatory or technical limitations. For example, heavily regulated industries -- such as healthcare, financial services and government -- have restrictions that prevent the use of public cloud infrastructure. As such, in the past decade, cloud providers have developed ways to provide more regulatory-friendly infrastructure as well as hybrid approaches that combine aspects of third-party cloud systems with on-premises computing and storage to meet critical infrastructure needs. The evolution of both public cloud and hybrid cloud infrastructures will no doubt progress as organizations seek the economic and technical advantages of cloud computing.
In addition to innovations in cloud storage and processing, enterprises are shifting toward new data architecture approaches that allow them to handle the variety, veracity and volume challenges of big data. Rather than trying to centralize data storage in a data warehouse that requires complex and time-intensive data extraction, transformation and loading, enterprises are evolving the concept of the data lake. Data lakes store structured and unstructured data sets in their native format. This approach shifts the responsibility of transformation and processing to end points that have different data needs. The data lake can also provide shared services for data analysis and processing.
3. Adoption of advanced analytics, machine learning and other AI technologies increases dramatically
With the vast amount of data being generated, traditional analytics approaches are challenged because they're not easily automated for data analysis at scale. Distributed processing technologies, especially those promoted by open source platforms such as Hadoop and Spark, enable organizations to process petabytes of information at rapid speed. Machine learning and AI systems allow them to more easily spot patterns, detect anomalies and make predictions than they could before. Enterprises are using big data analytics technologies to optimize their business intelligence and analytics initiatives, moving past slow reporting tools dependent on data warehouse technology to more intelligent, responsive applications that enable greater visibility into customer behavior, business processes and overall operations.
No technology has been as revolutionary to big data analytics as machine learning and AI systems. AI is used by organizations of all sizes to optimize and improve their business processes. Machine learning enables them to more easily identify patterns and detect anomalies in large data sets to provide predictive analytics and other advanced data analysis capabilities. This includes recognition systems for image, video and text data; automated classification of information; natural language processing capabilities for chatbots and voice and text analysis; autonomous business process automation; high degrees of personalization and recommendation; and systems that can find optimal solutions among the sea of data.
Indeed, with the help of AI and machine learning, companies are using their big data environments to provide deeper customer support through intelligent chatbots and more personalized interactions without requiring significant increases in customer support staff. These AI-enabled systems are able to collect and analyze vast amounts of information about customers and users, especially when paired with a data lake strategy that can aggregate a wide range of information across many sources.
Enterprises are also seeing innovations in the area of data visualization. People understand the meaning of data when it's represented in a visualized form, such as charts, graphs and plots. Emerging forms of data visualization are putting the power of AI-enabled analytics into the hands of even casual business users. This helps organizations spot key insights that can improve decision-making. Companies are discovering the value of data-driven decision-making and the power of data across the organization. Advanced forms of visualization and analytics tools even let users ask questions in natural language, with the system automatically determining the right query and showing the results in a context-relevant manner.
4. DataOps and data stewardship move to the fore
Many aspects of big data processing, storage and management will see continued evolution for years to come. Much of this innovation is driven by technology needs, but also partly by changes in the way we think about and relate to data.
One area of innovation is the emergence of DataOps, a methodology and practice that focuses on agile, iterative approaches for dealing with the full lifecycle of data as it flows through the organization. Rather than thinking about data in piecemeal fashion with separate people dealing with data generation, storage, transportation, processing and management, DataOps processes and frameworks address organizational needs across the data lifecycle from generation to archiving.
Likewise, organizations are increasingly dealing with data governance, privacy and security issues. In the past, enterprises often were somewhat lax about concerns around data privacy and governance, but new regulations make them much more liable for what happens to personal information in their systems. Due to widespread security breaches, eroding customer trust in enterprise data-sharing practices, and challenges in managing data over its lifecycle, organizations are becoming much more involved in data stewardship and working harder to properly secure and manage data, especially as it crosses international boundaries. New tools are emerging to make sure that data stays where it needs to stay, is secured at rest and in motion, and is appropriately tracked over its lifecycle.
Collectively, these big data trends make working in the big data space an exciting place to be in 2021 and no doubt through the foreseeable future.