In his previous article, expert contributor Mark Whitehorn looked at what we shouldn't consider when selecting and buying a database. In this follow-up, as promised, he offers his expert advice on what we should consider.
In my last article, I made the unlikely assertion that, for average applications, we shouldn't make purchasing decisions about database engines on the basis of cost, robustness, scalability and/or speed. You read that right. These are simply not the issues that they once were. That was all for the "not to do" list, so now here comes the "to do" list.
Having disregarded the aforementioned four criteria, the decider for your choice of database engine should be whatever you specifically want from it -- or want it to do for you. Sounds obvious, so let's expand that idea with a few examples. In all probability, you are not buying an application and database engine solely to store data: there is usually something specific that you want to achieve. There are many technologies that can add functionality to a mere data store, including:
- Data warehousing: Gathering data from disparate systems into a single repository.
- OLAP: On-Line Analytical Processing is a way of structuring data in a multi-dimensional form, often called a cube, to facilitate rapid and intuitive analysis.
- Data mining: Algorithms for discerning patterns in data, some of which might be interesting.
- Spatial data: The ability to store and manipulate co-ordinates that describe geographical locations.
- Complex/user-defined data types: For instance, handling XML data.
- File streaming: The ability to store Binary Large Objects (BLOBs) in the file system and yet have the object appear in the database as if held in a table.
- Grid computing: A means of pooling the resources of many computers.
In today's information-hungry world, it is quite possible that you'll have an interest in business intelligence (BI), which is all about finding information in a mass of data (and why would we gather that mass of data if not to unlock the secrets it holds?). The term BI encompasses the first three areas on the above list -- data warehousing, OLAP, data mining -- and more.
This is not an exhaustive list of everything that might interest your organization, but these are all areas where the major database engines can be distinctly different. So once you have identified a feature (or more than one) that is vitally important to achieving your particular goal, you have identified a distinguisher that should heavily influence your choice of database engine. The analogy to buying a car works here, just as it does in the 'how not to…' article: If you really love the idea of driving down the California coast's Big Sur highway with the top down, then you'll scope out all the convertibles, ignoring SUVs, trucks and the rest.
The same holds true in database selection.
For example, if you were interested in BI functions, you would probably look closely at Microsoft because, in recent years, it's put a huge amount of effort into its BI software technologies. With SQL Server, Microsoft includes in the box an ETL (extract, transform and load) tool called Integration Services, essential for populating and managing a data warehouse. Also in the box, you'll find Analysis Services, arguably the world's best multi-dimension database engine.
So I would argue that Microsoft has a good, rounded BI offering. On the other hand, I would also argue that, for example, IBM has a better range of data mining algorithms, so when it comes to BI, the two companies have quite different strengths.
But perhaps you are interested in spatial data? IBM and Oracle have supported spatial data types for years, while SQL Server hasn't quite got them yet: They'll appear in SQL Server 2008 due later this year, and what we'll get will, of course, be version 1.0 of spatial data type support.
Grid computing? Oracle rules here … and the list goes on. Each database vendor has concentrated on the feature set it deems most important, and saleable, and this leads to different levels of support for the various features.
You may, for example, be gearing up to handle XML data, in which case you'll need to carefully evaluate how well each engine supports XQuery -- the language for querying XML data -- and also the XML shredding functionality that each offers.
The point here is not to waste much (if any) of your precious time looking at robustness, scalability, speed or cost -- which, as I said at the outset, are really not the issues they once were. Instead, you should prioritize and focus on the specific areas of database technology and features that will deliver what you want.
Of course, knowing what your organization wants is more than half the battle (and may require some long meetings), but that should ultimately help you to determine which engine is truly right for your company.
About the author: Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling, data warehousing and business intelligence (BI). Based in the U.K., he works as a consultant for a number of national and international companies, designing databases and BI systems. In addition to his consultancy practice, he is a well-recognized commentator on the computer world, publishing about 150,000 words a year, which appear in the form of articles, in publications such as PCW and Server Management Magazine, white papers and books.
He has written nine books on database and BI technology. The first one "Inside Relational Databases" (1997) is now in its third edition and has been translated into three other languages. The most recent is about MDX (a language for manipulating multi-dimensional data structures) and was co-written with the original architect of the language – Mosha Pasumansky. Mark has also worked as an associate with QA-IQ since 2000. He developed the company's database analysis and design course as well as its data warehousing course.