This content is part of the Essential Guide: Cloud data warehouse guide: Using Redshift, rival platforms

IBM pushes MPP engine to boost cloud database services line

IBM has added massively parallel processing and R language support to its dashDB software -- important steps for a data warehouse database tailored for distributed cloud computing.

IBM continues to update its dashDB relational data warehouse system as part of an effort to further its cloud database services portfolio. And the company showed off dashDB and its latest addition to the technology -- support for massively parallel processing (MPP) across clustered systems -- at a self-styled "boot camp" event in Boston last week.

Released last fall, dashDB is based on the BLU Acceleration in-memory computing technology from IBM's flagship DB2 relational database, but it also brings IBM Netezza columnar-style processing to cloud implementations. As such, the software is designed to be a competitor to Amazon Redshift, the cloud data warehouse service that Amazon Web Services (AWS) brought to market at the end of 2012.

Like other relational database vendors, IBM has been pressed to respond to new cloud offerings -- especially those from cloud computing leader AWS. Last month, IBM announced an MPP version of dashDB aimed at faster query processing and better scalability. Support for the increasingly popular R analytical programming language is another of IBM's recent additions to dashDB.

The level of R's integration with the cloud database was a plus in the eyes of one dashDB user from the big data stronghold of digital advertising and marketing who spoke at the IBM Cloud Data Boot Camp.

"From our point of view, dashDB is a columnar database with DB2 features that has R built into it," said Shiv Sehgal, a solutions architect at RSG Media, a company in New York that develops software for use by television networks, publishers and other types of media organizations. In an interview, Sehgal said the ability to give easy access to R to internal users and customers versed in it will be an important step in enabling business users to ask and answer analytical questions without having to turn to IT for help.

Big data, big kahuna

The rollout of dashDB has been gradual, but the technology may hold much of IBM's hopes for managing data in the cloud. Now, with MPP support, dashDB could be the path to the cloud for users of DB2 and the Netezza data warehouse appliance.

"MPP really is our big kahuna -- with it, you can add nodes as your warehousing needs grow," said John J. Park, dashDB product manager at IBM. "Strategically, this is the cloud offering to support our Netezza clients and DB2 clients."

One potential holdup for some users: Park said dashDB's compatibility with Netezza's implementation of the SQL programming language is still evolving. He estimated that Netezza SQL capabilities are currently covered in dashDB to the tune of 84%, adding that IBM is looking to reach "90-plus-percent compatibility" later this year.

The dashDB advances accompany IBM's acquisition last month of Compose Inc., a company in San Mateo, Calif., formerly known as MongoHQ. Compose specializes in database as a service software that automates the process of setting up, administering and scaling databases in the cloud. It has built up a sizable list of supported databases, starting with MongoDB, but moving on to include Elasticsearch, PostgreSQL, Redis and others. 

One of IBM's objectives in buying Compose is to enable developers to quickly spin up instances of databases in the cloud. Compose's technology and dashDB are available to developers through Bluemix, IBM's cloud platform as a service (PaaS) offering. And they're just part of IBM's growing cloud database services portfolio. IBM's cloud data management campaign arguably kicked off in earnest early last year with its purchase of Cloudant Inc., the maker of software based on the Apache CouchDB project. Cloudant's NoSQL database is designed specifically to address scalability and deployment issues with relational databases used in cloud settings.

Lots of users, lots of data

In addition to using dashDB, RSG Media employs the Cloudant software as something of a data lake to stage data for analytics, according to Sehgal. "Most important is its scalability. It can handle a massive scale of users," he said.

Sehgal and the users he supports also have a lot of different types of data to chew on. RSG's various systems take in Web log and social media data, as well as ratings, viewership and advertising data from information services and ad platform providers, such as Nielsen, Rentrak and Operative Media.

One planned use of that data is enabling cable TV networks to estimate the profitability of programming decisions -- for example, when to run Billy Madison on demand or Breaking Bad reruns. Such programming decisions historically have been made by gut instinct, but Sehgal said that's changing, as decision makers look to use analytics tools to gauge the cost of running a program versus the amount of money it's likely to reap through cable fees or ad revenues.

The mix of a NoSQL database and a SQL data warehouse provided by the IBM cloud services portfolio brings everything together, Sehgal said. "When it comes to some of the applications we're doing, we're taking linear ad sales data, Twitter data, [and] all kinds of data. And to ultimately relate it back to how a specific show is performing, we do need a SQL approach." With the cloud, he added, "we can have everything in one place."

According to a report released by forecast firm Research and Markets in January, the global market for cloud-based data management services is expected to grow at a compound annual rate of 30.5% -- from $3.51 billion in 2014 to $13.28 billion by 2019.

Thus far, much of the big data cloud fanfare has accrued to startups working outside the relational model. But like IBM, other established database players aren't standing still. Oracle is making a big push to build up its cloud database services platform, including a cloud version of its NoSQL Database software; in addition, the latest release of Oracle Database 12c supports document storage and SQL-based querying of JSON, an often-used format for mobile, Web and cloud apps. And while working on its own columnar response to Amazon Redshift, Microsoft has also begun to promote DocumentDB, its NoSQL software for JSON that runs in its Azure cloud.

Next Steps

Tap into our guide to cloud data management

Discover changes in store as data goes "cloudward"

Find out how cloud, machine learning and data can order up a dinner

Dig Deeper on Database management system (DBMS) architecture, design and strategy