BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Data managers have lately directed a lot of attention at advances in specialized data warehouse engines and NoSQL databases, but flagship relational databases are not standing still, as IBM's DB2 BLU Acceleration software shows.
IBM's stalwart DB2 relational database management system (RDBMS), for example, has added numerous capabilities, including enhanced in-memory data handling, data skipping, improved compression, support for columnar analytical processing and more. Some of these traits are just the kind of thing that has given new-generation relational analytic engines and NoSQL upstarts their allure.
I am up to my elbows in unstructured data.
database engineer and architect,
Columnar processing, often coupled with compression, has become associated with the new breed analytical engines that arose from the likes of Aster Data (now part of Teradata), Vertica (now part of HP), ParAccel (now part of Actian) and others. But several mainstay relational databases have come out with columnar enhancements.
Columnar processing focuses processing efforts more narrowly on data sets specifically needed for common queries. It has multiple advantages, including reduced I/O and improved use of cache.
Recent updates that arrived in DB2 10.5., known collectively as "BLU Acceleration," can support sped-up "I/O bound" operations while still capitalizing on available in-house RDBMS skills, according to Kent Collins, who is database engineer and architect with Burlington Northern Santa Fe Railway (BNSF) Corp., based in Fort Worth, Texas.
Improved data compression has had an immediate helpful effect in cutting memory requirements, he said.
"It's been very positive for us. We just moved a 400 GB database, and when we finished it was 80 GB," he said. BNSF has also seen speed increases of as much as a hundredfold for some queries with BLU.
Stepping down big data and turbocharging queries is important to BNSF, a railroad that is collecting more and more types of data on far-flung operations that saw it in 2012 haul more than 1 million carloads of agricultural commodities, 2.2 million coal shipments, 4.7 million trailer or container shipments, and 1.7 million carloads of industrial products.
Said Collins, whose data feeds include text messages, radio messages and video, "I am up to my elbows in unstructured data." He then quickly recalibrated the estimate. "I am up to my eyeballs." He said column-level data processing that can be programmed using established SQL methods has been a big step toward taming the unstructured data deluge.
Take me out to the new RDBMS game
In a way, additions to relational databases are mirroring larger changes in data architecture, said Bernie Spang, director for strategy and marketing for IBM Database Software and Systems.
"We've moved from the world where you defined your data problem and then decided which relational database to use. Now the question is, 'What data technology should I use?' And even in the RDBMSs, there is a difference between the old generation and the new generation. It's a new ball game."
IBM has applied some state-of-the-art data technology with DB2 BLU, said IBM Distinguished Engineer Sam Lightstone. The compression is "actionable," he said, meaning that the mode of compression adapts to the kind of data being processed. It allows analytics to run on compressed data directly -- without decompression steps that add processing overhead, according to Lightstone.
"BLU is compression-optimized, in-memory-optimized and it's columnar," he said. It supports data skipping (in which irrelevant data is ignored), parallelism and vector-processing scans too. "It is the combination of these things that gives DB2 huge speedups," Lightstone said.
Narrowing the analytics gap
Many advances in data technology in recent years have been in the realm of specialized analytical relational database management systems, according to industry observer Curt Monash, president of Monash Research and editor and publisher of DBMS2 and other blogs. But in general, flagship relational databases are "narrowing the gap," he said.
Monash said that DB2 BLU could be seen as a first step. "In its first iteration, it is a single-server product, and 'in-memory single server' is definitely a limitation." As well, he points out that the first version of BLU is optimized for 10 TB databases, although it is capable of ramping up to 20 TB.
Monash noted that IBM has other specialized analytical RDMBS approaches beyond DB2, one of which is its Netezza data warehouse appliance.
IBM is far from alone in the race to enhance the major RDBMSs. As data-related challenges grow, resurgent RDBMS technology could well be welcome by many.
Differences between DBMS and RDBMS