In Part 1 of this series on big data myths I indicated that the goal of most big data projects is analytics. In other words, big data systems are almost always developed to improve the analytical capabilities of an organization; big data almost always means analytics. Now some try to make us believe that the opposite is true as well: analytics almost always means big data. They see big data and analytics as two sides of the same coin: big data is to support analytics and analytics requires big data. The latter is a myth. Analytical capabilities can definitely be improved and extended with just a little bit of data. Big data is not always a prerequisite.
Let me give an example. Some time ago I ordered on a website an album by a band called Longbranch Pennywhistle. This album was missing in my collection. When ordering this album the website informed me that an album by another band called Shiloh was in stock. I was asked if I was interested that one as well? Which I was, so I bought both of them. Afterwards, I wondered how they did that, because they can’t apply logic such as: 400 customer have bought product A and 250 of them bought product B as well, you’re now ordering A, so you’re probably also interested in B. I can guarantee you, this website can’t apply this kind of logic, because no 400 copies of the album by Longbranch Pennywhistle are being sold in a year, probably just one or two.
So, I kept wondering how they were able to discover a relationship between these two albums? I decided to give them a call. When I had them on the phone, I asked them to connect me with one of their IT specialists, which they did to my surprise. I asked how they were able to recommend that Shiloh album. The guy explained that it was simple. They store everything they know about bands, artists, and albums in a database. It’s like a network of knowledge on music. When someone buys a product, the network is navigated to find relationships. The relationship between these two bands is that one of the members of Longbranch Pennywhistle and one of Shiloh started another band called The Eagles. My final question was whether this network database was a big database. The answer was a definite No. In fact, measured in bytes, it was a really small database. This website was using a small database to support some of their most important forms of analytics.
Conclusion, for some really fancy forms of analytics, big data is not always needed. “Small” data can be sufficient. It’s not about the size, it’s about the quality of the data and about having the right data at the right time. Some forms of analytics really require BIG data, but … not always.