Not so long ago I attended a session in which the speaker was very clear on what big data is and what it is not. In his opinion, big data is unstructured data and unstructured data is big data. With unstructured data he meant textual data, such as emails, social media messages, and contracts, but also video and image so on. According to him structured data was not big data, because we have been processing structured data since the dawn of IT. So, nothing new there.
I don’t agree with this view at all. There are many different forms of big data. The amount of unstructured data can definitely be humongous and qualify as big data. But the same holds true for structured data. Many big data systems exist today that process, store, and analyze staggering amounts of structured data. For example, telecommunication companies monitor outages to detect drops in the service level; internet companies monitor every form of usage by website visitors to influence which ads to present or which products to recommend; power plants monitor overheating of components; banks use it with their stock tickers and for real-time fraud detection; distribution centers monitor incoming and outgoing products with their RFID readers; on-line gaming companies deploy it to detect malicious behavior and monitor quality of service; and the list goes on. In all these examples big amounts of structured data are processed. It’s the world of sensor data and machine-generated data.
People may not be aware of it, but in their daily lives they are responsible for generating numerous, massive data streams. For example, by driving their cars they generate several data streams, such as the speedometer that consumes the data stream coming from the speed sensor and the temperature gauge that presents data coming from the sensor that measures the outside temperature. The expectation is that in the future an average car will have 200 sensors. The amount of data all these cars will generate is phenomenal.
The car is not the only device that generates data streams that people use. Smart phones, tablets, and smart watches are continuously producing gigantic amounts of data. The apps on our phones also stream data continuously. And it doesn’t stop with the devices we carry around. When we watch TV, the provider monitors which programs we watch which influences the ads that are shown. Smart energy meters send data to utility companies.
Many of the sensor-driven systems generate massive amounts of machine-generated data. All this data is highly structured. Based on the sheer volume and the speed with which all this data has to be analyzed, it categorizes as big data.
And then we have the Internet of Things in which countless devices talk to each other. Again, the amount of data that will flow between these devices will be staggering. Gartner forecasts that 4.9 billion connected things will be in use in 2015, and this number will reach 25 billion by 2020. Especially the manufacturing, utilities, and transportation industries will be the top three verticals deploying the IoT. Can you imagine the amount of (structured) data being generated?
All this sensor data is highly structured data. If the amount of highly structured, machine-generated data hasn’t surpassed the amount of unstructured data being generated yet, it will do so in the near future. Almost all of this machine-generated and highly structured data is generated and stored for analytical purposes and nothing else. And that makes it big data.
So, big data is not unstructured data only, that’s a myth. Structured data can be big data as well. Let’s not distinguish what’s big data or not based on whether the data is structured or not.