This article originally appeared on the BeyeNETWORK.
So exactly how do humans cope with large and complex problems? It is merely human reaction to take the problem and break it into smaller pieces. Once the problem is subdivided enough, it is solvable. First, the problem solver addresses one subset of the problem, then another, and so forth until the larger problem is solved.
Mathematicians use this technique all the time. Astronomers use this approach. Programmers use this technique. It is just human nature to approach large problems this way.
For example, programmers cave out the boundaries of the problem, sometimes called the “scope.” They solve the problem within the scope. If they created too large of a scope, they further narrow the scope until they reach a point of solvability.
In the context of breaking problems down to a point of solvability, consider the challenge faced by the data architect when looking at and managing corporate data. In many corporations, the task of getting a handle on lots of complex and numerous types of data coming from lots of places and interacting in almost a random pattern is daunting. So how do data architects address this large and complex problem? In the time honored approach of divide and conquer, the data architect breaks the problem down into subsets.
A natural subset break is in creating data marts. By setting aside one group of data for finance and their analytics, for example, the data architect has narrowed the problem of managing data across the enterprise. There are data marts for sales, marketing, engineering, management reporting, and so forth.
So there is a logical and natural progression leading to the creation of data marts. In the context of this discussion, data marts are merely a device of breaking a large problem down to one of manageability.
But the motivation for data marts and their acceptance does not stop there. End users like their data marts. There are lots of reasons why end users like data marts, but probably the most basic reason has nothing to do with technology or problem solving. The issue is that end users like the feeling of ownership and control. The data mart becomes their data, and they can customize, control, or manage their own data however they like.
So there are political and ownership reasons why organizations like their own data marts. And there are some technology reasons as well. When an organization has its own data mart it can:
- Design the data mart however it likes,
- Operate the data mart on its own calendar with no consideration given to any other department,
- Control how data is accessed and used in its own data mart, and
- Choose whatever analytical technology best suits the needs of the department with no consideration for other departments or users of other data marts, and so forth.
But there is one other really powerful reason why end users like their own data marts. That reason is that by moving data into a data mart, the cost of machine cycles goes down. The most expensive machine cycles the organization has are those that drive the large central data warehouse. By moving the data out to a data mart, the cost of processing drops dramatically.
It is true that there are some architectural rules of the road that need to be followed when building and using a data mart. Some of those rules of the road are:
- Data cannot be shared between data marts. If there is a need for sharing, the data must be placed in the data warehouse. Once there, anyone can access and use the data, and
- The source of all detailed data is the data warehouse. Data marts do not create and update details of data on an autonomous basis.
These rules of the road are simple and at the same time give the data mart analyst the needed functionality. These rules of the road do not say that data is sharable between different analysts. They merely say that if data is to be shared, there is an architecturally approved way to do it.
Into this fray come vendors who say that data marts should be “logical,” not physical. In fact, these vendors say that there shouldn’t be a need for a data mart at all. In light of all of these very real and very positive reasons for having a data warehouse, one has to ask – who is the vendor thinking about when the vendor recommends that either there be no data mart at all, or if there is a data mart that the data mart is somehow a virtual subset of data on a large centralized data warehouse?
Who benefits from such a philosophy? The answer is that it is the vendor who benefits, not the consumer. And how is it that the vendor benefits? The vendor benefits by selling more hardware and more software. The machine cycles the vendor sells are the most expensive cycles that the corporation pays for. The more the vendor can convince the consumer to place on the vendor’s large centralized data warehouse, the more money the vendor receives. So when a vendor tells the consumer that if data marts are to be built at all, that those data marts are to be built in a “virtualized” manner, the vendor is thinking about the vendor, not the customer.
The truth is that physically separate data marts are as natural and as useful as the sun rising in the east each day. It simply is the natural order of things.