Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Look at workloads when doing data planning, says expert

Analyst Curt Monash discusses how close data planning of workloads can clear the way in the data architecture field.

With all the NoSQL, NewSQL, Hadoop and SQL-on-Hadoop databases, along with the columnar relational database management system (RDBMS) extensions, it is an exciting time for data architecture – maybe too much so. Long-time database management system (DBMS) industry observer Curt Monash helps sort through the maze of data planning today. This is part two in a series. Read part one, on data parallelism.

Sometimes I wonder what guide the past can provide. When they were new, it took a while for the first RDBMSs to settle into three or four well-known examples, and for a fairly common way to work with SQL to emerge.

Monash: Well, back in the 1980s we had several different kinds of DBMSs. Before the advent of the relational DBMS, every DBMS product had its own data manipulation language. Within a few years, SQL won. At that point, we had the new DBMSs having similar languages and similar programming models.

Curt MonashCurt Monash

And there probably were five to 10 that mattered – most of which have survived in some form or another to this day – such as Oracle; DB2; Microsoft SQL Server – which itself is a fork of Sybase's main DBMS; Progress OpenEdge; and Teradata and Sybase IQ. And that's where things were for a while.

What created the opportunity for new RDBMSs were cost shifts in various aspects of the technology and, in connection with that, the importance of the scale-out clusters.

In a way, the Oracle DBMS today has taken on the role that the IBM mainframe operating system had a few decades ago. It is used in a large fraction of large enterprises. It is reliable. It is old. It is cumbersome. It is expensive. Oracle has high market share so it provides a price umbrella for others.

It is very hard to (decide on a DBMS) without having a particular workload in mind. You can't ask someone to pick … without knowing the workload.
Curt Monash

A lot of enterprises have invested in a lot of people around that DBMS; not everybody involved thinks having high administrative costs is a bad thing. People who are well paid to be in jobs that are a part of those cost structures think it is pretty good. All those things that used to apply to the IBM mainframe system apply to Oracle today.

Now, with the new players in NoSQL, Hadoop and NewSQL, they tend to excel on certain types of workloads and to be inappropriate for others. But, the better combinations of best-product and small-but-strong companies are likely to prosper for quite a while.

Yet, the technology manager is confronted with so many options. One team member might promote Spark, and another Impala, and yet another columnar extension for PostgreSQL, and so on. How do they logically proceed?

Monash: It is very hard to answer that without having a particular workload in mind. You can't ask someone to pick between an 18-wheel truck or a sports car or a bicycle without knowing the workload.

It really depends on a lot of things. There are some workloads that tightly integrate SQL with very non-SQL processing - that go even beyond the analytical extensions to SQL RDBMSs. For some workloads, SQL-on-Hadoop is overwhelmingly the way to go. For that matter, there are companies that are adamantly opposed to paying for software at all. They like open source alternatives and many of those open source scale-out SQL alternatives will work on Hadoop.

There are [many] considerations. One of them is the absolute size of the database. Another is the absolute magnitude of throughput of work that you need. And there is your concurrency -- how many queries are likely to be hitting the database at the same time -- or how many people are actually going to be using the system at the same time.

Another consideration is what kind of work you are doing besides SQL queries. Yet another is the complexity of the SQL queries is, are there likely to be 14-way joins and lots of tables or are they a lot simpler than that? That is an incomplete list, but you can probably start the discussion with those.

Jack Vaughan is SearchDataManagement's news and site editor. Email him at jvaughan@techtarget.com, and follow us on Twitter: @sDataManagement.

Dig Deeper on Database management system (DBMS) software and technology

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.