Choosing sides on the scale up vs. scale out debate used to be easy. Choices were few and for most organizations,...
it was symmetric multiprocessing (SMP) all the way – the classic scale up approach to computing.
But with the rise of commodity hardware – and more organizations looking to capitalize on the Internet-driven "big data" explosion – scaling out has become a more viable option. As a result, massively parallel processing (MPP) and distributed computing approaches are growing more popular all the time, according to Tony Iams, a senior vice president with IT analyst firm Ideas International in Port Chester, N.Y.
SearchDataManagement.com got on the phone with Iams to learn more about the longstanding scale up vs. scale out debate. Iams explained the most common uses of SMP, MPP and distributed computing and had some advice for those seeking to match database workloads with the correct architecture approach.
You always have to be careful about matching the right workload with the right scalability approach.
Tony Iams, senior vice president and senior analyst, Ideas International
Could you describe the prevailing approaches to hardware architecture and how they fit into the scale up vs. scale out debate?
Tony Iams: The first is symmetric multiprocessing, or SMP, which is the classic form of scaling up. That's where you have lots of processing units inside of a single computer. And I say "processing units" because that line is blurring. It used to be "processors," but now processors have multiple cores inside of them and those cores might have a lot of threads. But the point is that all of it is inside of one enclosure, one computer system. That is the traditional approach to scaling up.
What are the other two major approaches?
Iams: Then you have scaling out and the most extreme form of that is what you might call distributed computing. That is where you have lots and lots of computers that are cooperating to solve a problem. The third approach is massively parallel processing (MPP) and that kind of sits in between [SMP and distributing computing] in the sense that you still have lots of machines that are collaborating on solving a problem. The difference is that with massively parallel processing, there is usually some assumption that there is some sharing of something. At a minimum, you have shared management in that there may be many separate computers but you manage them as if they are a single computer. With MPP, there is also usually some form of sharing memory.
How does MPP differ from SMP in terms of sharing memory?
Iams: With SMP, by definition, all reading and writing can be done by any thread, core or processor. They can all get to the memory equally easily for reading or for writing purposes. In MPP, again there is usually some form of sharing, but depending on the implementation – and there are many different implementations of MPP – you have different compromises that you have to make in terms of who can get to what memory; whether there is a penalty for reading the memory; and whether there is a penalty for writing the memory. With MPP, users have to consider how that memory sharing works: Who can read? Who can write? The answers to those questions are going to vary significantly [depending on] the implementation.
Why do I feel like I'm hearing vendor references to MPP more often than in the past?
Iams: I think because the industry in general is trending towards scaling out. What I just explained to you is purely an architectural view. But the more important aspect of this is now matching [the computer hardware architecture] with workloads. Depending on what kind of workload you're trying to host, each of these approaches is going to make more sense or less sense. You always have to be careful about matching the right workload with the right scalability approach.
How has the process of choosing the right scalability approach changed over the years?
Iams: The rules 10 to 15 or 20 years ago were pretty clear. If you wanted to do heavy duty database processing, you wanted it to scale up and you would use SMP. That was it. End of story. The number of workloads that you would want to use with the distributed computing or even MPP was extremely limited. But nowadays with the Internet and Web-based computing and all of these services that people are using on the Web like Facebook and Google – a lot of those work really well with a scale out approach, and distributed computing and MPP are starting to be applied more widely.
How should an IT organization go about matching database workloads with the right scalability approach?
Iams: If you're just talking about the classic transaction-driven database workloads that drive the day to day operations of a typical business, that still works best on SMP type systems. That is because with transactions, you're going to be writing a lot of data by definition. If you're going to write something, you have to have very efficient access to that memory, and that is why you need SMP. There is no more efficient way to access memory than with SMP.
What if the database workload is created for business intelligence or analytics purposes?
Iams: More and more database work is based on analysis, which is not necessarily writing data. In this case, you're more interested in reading the data because you're going to go through there and analyze it looking for patterns, business opportunities and trends. That is increasingly where the value is and that is not a new thing. People have been doing data warehouses and stuff like that for a long time. But now there has been an uptick in the volume of data. Internet usage is generating data, mobile phone usage is generating data and all that stuff is tracked now. That has produced an explosion in data. [Data warehouses and associated analytics projects] have to scale more than ever before, and MPP is the right answer for that.