Mike_Kiev - Fotolia
The year 2014 saw progress in big data architecture development and deployment, as users gained more experience with NoSQL alternatives to relational databases, and Hadoop 2 gained traction for operational analytics uses beyond the distributed processing framework's original batch processing role.
Those trends were detailed in a variety of stories on SearchDataManagement. Interest in big data technologies was often led by the Hadoop 2 platform, which first arrived in late 2013. The new release separated the Hadoop Distributed File System from its earlier reliance on the batch-oriented MapReduce programming model and processing engine, opening up Hadoop to a far wider array of uses -- for example, interactive querying and stream processing applications. But going from proof of concept to production was sometimes a bridge too far for the Hadoop army, leaving further deployment battles to be waged in 2015.
Data architects and managers also had a lot of on-the-job learning to do in grappling with new in-memory processing schemes that became available for mainstream relational databases. Still, much of the big data discussion focused on nonrelational alternatives -- and there were plenty to talk about. "Once you decide you can do something that doesn't require a monolithic SQL database, there's no shortage of emerging technologies today," said Joe Caserta, founder and president of New York-based consultancy Caserta Concepts.
Despite the persistent din about Hadoop, for example, even it was sometimes obscured by yet another comet-like open source phenomenon: Spark. The analytical processing engine is often paired with Hadoop 2 to run batch jobs faster than MapReduce can. But Spark also got increasing attention on its own for use in machine learning, another hot-button trend during the past 12 months.
Just say NoSQL
MongoDB, Couchbase, Aerospike and more -- the litany of NoSQL databases resounded. There barely seemed a day that didn't include a new NoSQL technology to consider, as Michael Simone, global head of CitiData platform engineering at Citigroup Inc., wryly noted during a presentation at the 2014 MongoDB World conference last summer. But the humor didn't belie the reality that NoSQL software increasingly was deployed to deal with masses of data, often new forms of information coming off the Web that didn't fit well in rigid relational schemas.
For example, NoSQL databases were tapped as an in-memory store for real-time decision making on Web-based marketing data, to power a tech support system that helps call center operators track gamers' website activity to resolve technical problems, and to store data for analyzing social media trends and outreach efforts. In some other cases, though, data managers opted for so-called NewSQL technology, which seeks to pair the best traits of SQL and NoSQL platforms.
Building for the big data future
Underlying these developments was plenty of work aimed at incorporating the new big data tools into enterprise data architectures that often are moving targets themselves. "The greater issues with big data today are about the architecture -- how you build an environment where you have many new technologies working together," said Vince Dell'Anno, managing director of information management for the data supply chain at Accenture's analytics consulting group.
Dell'Anno said an upcoming challenge in many IT departments will be managing hybrid environments that enable tens of thousands of end users to access the newly available data. In fact, building scalable big data systems and integrating them with existing data warehousing, analytics and operational environments was a major theme in 2014. At times, the new tools required big data architecture implementers to forgo familiar ways of working with data schema, turning some data management conventions on their heads.
As 2014 neared its end, Hadoop had a coming-out party of sorts when Hortonworks Inc., one of the three independent Hadoop distribution providers, launched an initial public offering (IPO). In the process, the company raised $100 million -- which, given the mounds of lucre venture capitalists have already heaped on Hadoop, is a relatively modest amount. But Ovum analyst Tony Baer wrote in a blog post that the IPO was more notable as a statement about Hadoop's business prospects. "This is very much a greenfield market, as almost all sales are new, with few being competitive replacements," Baer said. And, he added, "there's still a lot of virgin market out there."
Other Hadoop players, as well as some NoSQL market leaders, are expected to join the IPO parade in 2015. That, in turn, would help keep the pipeline of new data processing technologies primed and full -- likely giving data management teams even more to get their arms around next year.
Read about emerging cognitive computing architecture
Is it ready for prime time? Learn about the Internet of Things
Find out how cloud computing is changing the data management world
IT pros talk top enterprise NoSQL architecture challenges
Examine how cloud complicates the data processing pipeline