This is a two-part article:
Columnar database software, other technologies aim to speed analytics
Analytical database tips: deployment advice and potential challenges
From a data management perspective, the last decade primarily has been about gathering data from the various enterprise applications used to run a business and consolidating the information, often in a data warehouse. Enterprises have made significant investments in extracting and cleansing their data in order to make it consistent. Business intelligence (BI) tools were installed and are being used to automate reporting and data analysis that previously was done by the enterprise applications themselves or in spreadsheets and other “data shadow systems.”
With the basics of data warehousing and BI well in hand, many organizations are seeking to use more advanced analytics technology to help increase their sales and profits. Advanced analytics programs can include tools such as predictive analytics, statistical analysis, data mining and data visualization software. But no matter what technology is used, the underlying foundation for advanced analytics has to be comprehensive, clean and consistent data. In addition, and no less critical, performance must be fast for business users to really leverage analytical applications.
A typical data warehouse environment is built on a relational database tuned for extract, transform and load (ETL) processes and multiple data marts designed to support specific types of business analysis. That architecture works well for the basic reporting and analysis capabilities that enterprises implemented in their first-generation BI systems. However, it does present a few roadblocks to more advanced analytics.
For example, it’s costly and time-consuming to build and tune data marts. And even after the tuning work is done, query performance against a data mart may not be as fast as needed – a fact that often results in developers not putting enough data in a data mart to enable heavy-duty analytics applications. In addition, even though relational databases have been the workhorses of data warehousing and BI systems, they aren’t built solely for analytics and can require a great deal of skill to support such uses.
As a result, one of the keys to enabling advanced analytics is the development of analytical databases that are specifically designed for that purpose. The first such tools were online analytical processing (OLAP) databases, which went beyond relational data marts by building in well-understood best practices and functionality tailored to how business users actually analyzed data. Nowadays, analytical databases leverage technologies such as columnar orientation, massively parallel processing, shared-nothing architectures, partitioning and query optimization features geared toward analytic workloads.
Making the vertical leap with columnar database technology
Currently, the most common form of analytical database is columnar software. Columnar databases turn the row-based orientation of relational databases on its side, instead storing data vertically by column. The columnar approach is based on analytical usage patterns and essentially indexes all of the data automatically. That reduces the laborious efforts needed to tune relational databases to run specific queries in data marts and can also lower the number of skilled database administrators who are needed to do the tuning work. In fact, many enterprises have found that they can combine their various relational data marts into just one columnar database.
It’s important to note that column-oriented databases can be accessed by BI and advanced analytics tools in the same manner that they access relational databases. That means a shift to columnar database technology or indeed other types of analytical database platforms typically doesn’t require an organization to change the reporting, query and analytics software it’s using.
There are also other alternatives to consider for supporting analytics applications. With in-database analytics, you move the processing of compute-intensive BI functions directly into a data warehouse built on top of an analytical database instead of having to transfer data to separate systems for analysis. The in-database approach can reduce both setup and data retrieval times, ultimately resulting in much faster analytics performance.
Disk-free: Supporting analytics tools from memory
Another option is in-memory analytics, in which system memory is used as a persistent database for analytical data. Loading the data into memory and accessing it from there eliminates ongoing I/O calls to disk drives and can greatly speed up analytical performance as a result, without requiring costly database tuning or the creation of aggregate tables used to accelerate the processing of summary queries in a database. In-memory analytics is an old idea whose time might have arrived as 64-bit architectures become more prevalent both on servers and PCs. In addition, various tools are now available that let users keep databases in memory for long periods of time and back them up to disks.
A final alternative for building data marts and running analytic applications are BI and data warehouse appliances, which bundle hardware and software together in a single device that is configured and tuned for analytical processing. There are as many different architectures supporting the appliance concept as there are vendors offering BI appliances, which is a lot. The various architectures generally use commodity hardware to reduce overall costs, but they often include proprietary technology designed to help improve analytical performance.
Not only do the appliances address the performance issue, they also offer faster deployment times for IT staffers and the potential for a lower total cost of ownership compared to conventional unbundled data warehouse platforms. However, when evaluating appliances, prospective buyers need to ask the vendors to explain any proprietary hardware or software components for speeding up analytical processing and what is needed to implement them.
(Continue to the second part of this article to get Rick's analytical database deployment tips.)
About the author: Rick Sherman is the founder of Athena IT Solutions, a Stow, Mass.-based firm that provides data warehouse and business intelligence consulting, training and vendor services. In addition to having more than 20 years of experience in the IT business, Sherman is a published author of more than 50 articles, a frequent industry speaker, an Information Management Innovative Solution Awards judge and an expert contributor to both SearchBusinessAnalytics.com and SearchDataManagement.com. He blogs at The Data Doghouse and can be reached at firstname.lastname@example.org.