Sergey Nivens - Fotolia
I've long said that all data will eventually become big data, and big data platforms will evolve into our next-generation data processing platform. We have reached a point in big data evolution where it is now mainstream, and if your organization is not neck-deep in figuring out how to implement big data technologies, you might be running out of time.
Indeed, the big data world continues to change rapidly, as I observed recently at the Strata Data Conference in York. While there, I met with over a dozen key vendors in sessions and on the show floor.
Overall, the folks attending conferences like this one are less and less those slightly goofy and idealistic, open source research-focused geeks, and are more real-world big data and machine learning practitioners looking to solve real business problems in enterprise production environments. Given that basic vibe, here are my top five takeaways from Strata on the big data trends that are driving the big data evolution.
1. Structured data
Big data isn't just about unstructured or semi-structured data anymore. Many of the prominent vendors, led by the key platform providers like Hortonworks, MapR and Cloudera, are now talking about big data implementations as full enterprise data warehouses (EDWs). The passive, often swampy data lake idea seems a bit passé, while there is a lot of energy aimed at providing practical, real-time business intelligence to a wider corporate swath of BI consumers.
I noted a large number of the big data-based acceleration competitors are applying on-demand analytics against tremendous volumes -- both historical and streaming IoT style -- of structured data.
Clearly, there is a war going on for the corporate BI and EDW investment. Given what I've seen, my bet is on big data platforms to inevitably outpace and outperform monolithic and proprietary legacy EDW.
2. Converged system of action
This leads into the observation that big data evolution includes implementations that host more and more of a company's entire data footprint -- structured and unstructured data together.
We've previously noted that many advanced analytical approaches can add tremendous value when they combine many formerly disparate corporate data sets of all different types. Now, many of the aforementioned data warehouse/BI discussions also extend BI services to include and converge in unstructured and semi-structured data sources.
It's not fully clear if big data platforms are ready to be considered master systems of record for all transactional data in all circumstances. However, I'd say they are ready to be truly converged systems of action in big data evolution, replacing disjointed and disparate data processing silos with a single, much more powerful platform for operational analytics and real-time decision-making.
3. Pragmatic machine learning
As more and more companies deploy real production big data systems, the attendees at big data conferences seem more business-focused, serious and pragmatic. I was pleasantly surprised to see that there were hardly any over-reaching claims about AI at Strata, but, quite properly, a lot of practical material about effective machine learning.
In particular, there was a common theme across the event about how to deploy and maintain machine learning in production -- and not just about the latest gee-whiz research. I might even say this resulted in a more humanistic sense of the goodness that machine learning can provide to society, especially as compared to the unknown future surrounding the more science fiction-ish AI -- e.g., machine sentience.
4. Watching the watchmen
There was plenty at Strata concerning the use of machine learning to recursively manage large-scale and in-production machine learning processes, or at least using machine learning to better manage big data workloads, streaming data and IoT analytics.
I've also seen the internal application of machine learning in other IT-related segments that need to have robust, automated management at scale, like IT data center and hybrid cloud operations.
5. Desktop HPC
One of the last vendors I talked to was a small business called Ricker Lyman Robotic, which is starting to make small desktop stackable compute cluster nodes called Hivecells, each the size of an old model hard disk storage enclosure. These are designed to run distributed software, like big data platforms for developers, at a lower cost than running ongoing cloud instances. In a relatively small stack about a foot high, I can now easily create -- and hold -- my own personal HPC-like server cluster.
Even as we've now just about moved all our desktop and local computing infrastructure up into cloud services, the pendulum is swinging back with cheap, easily managed, low-power and incredibly powerful local compute resources. In a way, we've seen this coming with the power we each hold in our smartphones, not to mention watches and, soon, coffee pots, refrigerators and vacuum cleaners -- OK, Roomba, you're already there.
I'd summarize all these takeaways together as a big mark of maturity for the big data market. The focus of big data evolution is now practical for production and performance.
Infrastructure vendors have long toyed with and sometimes struggled to converge applications back into their infrastructure, but here we have a converged application/infrastructure stack that is distributable, scale-out, real-time, mainly open source, and capable of both mission-critical and mixed workload processing.