In this two-part series, Mark Whitehorn takes a look at the mechanics of choosing a database engine and starts by outlining what not to do during database selection. Part 2 offers more advice on how to choose a database.
If you need to buy a database engine (for example, Oracle, SQL Server, DB2), it's usually because your company is buying a new application, which needs an engine to host it. In most cases there are two primary drivers for choosing the engine to underpin the application.
One, of course, is the requirement of the new application itself. If it can run only on database engine X, then that's an easy decision.
The second driver is usually existing experience. Most organizations will already have existing databases, will have acquired some in-house expertise in driving the engines, and will want to stick with what they know.
These two drivers are perfectly fine, and I have no argument with either. Use them to make your decision easy. But these two don't always apply.
For example, the application may run on a range of engines, and your organization may already use several different engines. Or maybe you're a new field site with no existing database systems. So there are also scenarios where we aren't limited by the application or expertise/policy. In these cases, the question is: "Which other factors do you use to decide?" In my experience, this is where it all goes horribly wrong. The fearsome foursome
immediately raise their heads and proceed to bog down any sensible discussions for days.
"Whaaaat?!" you might say. "Are you mad? These factors are vital in any database engine! Everyone knows that!"
Yes, I agree. No question. But the point here – and one that many people seem to miss – is that all the mainstream engines (IBM's DB2, Oracle Corp.'s Oracle, Microsoft's SQL Server) are largely comparable in pricing and all extremely competent in the other three areas. So focusing on these factors to choose a database engine is rather like choosing your new car on the basis of:
- Reverse gear
- Side windows
Just like the four factors above, these are essential requirements for most people but very poor deal-breakers, because every car already has them. (OK, so the early Dodge Viper didn't have side windows, but no one would call the Viper mainstream.)
One of the problems with many database people is that they have very long, positively elephantine, memories. So I have grown old in some meetings listening to people opining that DB2 may be fast on mainframes but it crawls under Windows, that Oracle is unbelievably expensive, and that SQL Server falls over every time you wink at it. Or that SQL Server won't scale up and DB2 won't scale down. Sigh. These aren't facts, they are history. But history, when choosing a database engine today, is bunk.
Why are they history? Well, all the current mainstream engines started life being slow and feature-poor. It's absolutely true that SQL Server used to be a dog: Version 6.5 was particularly terrible. But that was 12 years ago. DB2 used be an exclusively mainframe product, but it has for many years (13, in fact) run under Windows – and on the down-scaling issue, since 1999 (that is, last millennium!) DB2 has been available as a version called DB2 Everyplace that runs on handheld and mobile devices. So it now scales from the mainframe to the palm of your hand. Oracle used to be eye-wateringly expensive, and although it remains unlikely that anybody would call it cheap, it has moved its pricing model much closer to those used by its competitors.
And please note that I'm not saying that the four factors are always to be ignored. A huge corporation that is genuinely pushing the envelope in terms of database size or the number of concurrent users needs to think about all of these very carefully. But, in truth, most of us don't work for such corporations. By definition, most of us work for companies with average requirements. And all of the mainstream database engines can handle average databases without any problem.
To put that another way, if you're presently running Oracle, then switching to SQL Server is highly unlikely to give you a tenfold improvement (or degradation) in speed. Essentially, the Big Three are now running neck and neck. If any of the engines was seriously behind the others in any of these factors, then the vendor concerned would put huge resources into fixing it. It wouldn't have a choice.
So using cost, robustness, scalability and/or speed as a decision-maker for most applications is essentially pointless. Does that mean you should flip a coin for your choice? Of course not. There are still a host of deal-breakers around. We'll take a look at them next time, when I discuss how to choose a database.
About the author: Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling, data warehousing and business intelligence (BI). Based in the U.K., he works as a consultant for a number of national and international companies, designing databases and BI systems. In addition to his consultancy practice, he is a well-recognized commentator on the computer world, publishing about 150,000 words a year, which appear in the form of articles, in publications such as PCW and Server Management Magazine, white papers and books.
He has written nine books on database and BI technology. The first one "Inside Relational Databases" (1997) is now in its third edition and has been translated into three other languages. The most recent is about MDX (a language for manipulating multi-dimensional data structures) and was co-written with the original architect of the language – Mosha Pasumansky. Mark has also worked as an associate with QA-IQ since 2000. He developed the company's database analysis and design course as well as its data warehousing course.