freshidea - Fotolia

Initiative targets Hadoop data management, better data policies

Hortonworks forms a data management initiative with Merck and others. Meanwhile, SQL and NoSQL may blur a bit.

People looking to track Hadoop advocate Hortonworks' moves in the wake of its recent IPO will mull news that the company is pushing better Hadoop data management. The move takes the form of a data governance initiative that includes Aetna, Merck and others. The week also saw news of a Forrester Research survey showing many data management professionals looking toward a day when SQL and NoSQL run together on one platform.

Hortonworks taps Aetna, Merck to join Hadoop governance initiative

Fresh off raising $100 million in a December IPO filing, Hadoop platform distributor Hortonworks, Inc. said it was forming a data governance initiative with customers Aetna, Merck and Target, as well as technology partner SAS. The ultimate goal is to provide a higher level of Hadoop data management, and in turn to move Hadoop deeper into enterprise computing.

Data governance is one of the key elements that Hadoop lacks as it seeks to move out of the proof-of-concept stage and deeper into operations at mainstream companies. Data governance is a critical issue for big data, particularly in financial companies, where compliance regulations are strict.

The company is working with end-users to create a flexible rules engine able to enforce data workflows that meet the needs of compliance rules, said Andrew Ahn, director of governance at Hortonworks. Software developed as part of the initiative will include Apache Falcon life-cycle management, the Apache Ranger security framework and a new policy rules engine that includes an audit data store that can hold pertinent metadata.

Ahn is familiar with the ins and outs of financial big data and knows its requirements based on his stints in application development at the New York Stock Exchange and the Pacific Exchange. A lot of the big data effort there is "custodial," he said.

"We had critical issues between governance and big data," he said, pointing to the need to comply with U.S. Securities and Exchange Commission (SEC) strictures. Large enterprises face similar challenges as they move advanced Hadoop implementations into production, he said.

The SEC and other regulators require firms to maintain auditable trails of transactional data. While Falcon can set some systems' policy, a finer level of policy processing is needed for many enterprises; thus, the new initiative.

Hortonworks anticipates a software release as early as this quarter, said Ahn. The plan is to follow up with a formal proposal to become an incubation project within the Apache Software Foundation.

SQL to NoSQL: 'Someday we'll be together?'

More than a few folks have dismissed the surge in NoSQL databases that has risen to challenge incumbent SQL. After all, SQL fought off incursions before -- object-oriented databases in the 1990s are a prime example -- by adding capabilities.

SQL databases continue to add capabilities today. One clear area is Javascript Object Notation (JSON). Much of the NoSQL rise was on the back of JSON. But relational databases from IBM, Oracle, Microsoft, Teradata, EnterpriseDB Corp. and others have added support in recent months.

One driver for trying to consolidate SQL and NoSQL operations into one database is support. The idea of building out a whole new infrastructure for two distinct database types -- SQL and NoSQL -- garners mixed reactions. That notion is backed up by a survey Forrester Research did for PostgreSQL maker EnterpriseDB. The study combined material from Forrester’s Business Technographics and other research as well as data from a custom survey of 50 U.S.-based IT decision makers responsible for enterprise architecture or application development.

The Forrester data shows that 42% of all survey respondents want to integrate NoSQL databases with relational ones. Meanwhile, 36% of respondents want to store structured and unstructured data together in their standard database. It could be a long time before full-fledged offerings come about, but it is fair to say that vendors are working now to try and absorb the most popular traits of NoSQL databases.

Next Steps

Learn about best practices for Hadoop in production

See how DBA skills are evolving

Get ready for Hadoop 2

Dig Deeper on Big data management