News Stay informed about the latest enterprise technology news and product updates.

Do Your Rollups Really Roll Up?

The success of the data warehouse - and sometimes the business itself - depends upon the appropriate defining of KPIs and key business concepts.

This article originally appeared on the BeyeNETWORK.

With contributions from Malcolm Chisholm,

Every enterprise has lines of business and/or geographic delineations that individually report figures that must be “rolled up” or summarized at the enterprise level. It is assumed that all division or region reports are using the same assumptions, units of measure and/or formulas. If, on the other hand, each division derives its statistics in a different way, trying to summarize them is like comparing apples to oranges – it can’t be done accurately. As you will see, such assumptions can be very costly to businesses and may go undetected for a long time. The business may have a vague sense that something is wrong, but will continue to make business decisions based on incorrect data.

In addition to line of business delineations, rollups are also performed on many different categories of revenue and expenses, such as product categories. Improper classification of categories and individual line items (such as costs) can also cause faulty drilldowns and rollups, and wreak havoc on the business.

The Meaning of Data is Based On Its Context 
If you were presented with the number “42,” you might chuckle – and it would be apparent that you are familiar with Douglas Adams’ story, The Hitchhiker’s Guide to the Galaxy, or you have seen the movie. The story provides the context for the data, and therefore supplies meaning. Otherwise, “42” is just a number. (For those unfamiliar with the context, “42” is the answer to life, the universe and everything. Confused? Read the book!)

Different lines of business may have different contexts behind their statistics. For example, marketing may use a different formula for revenue than the finance department does. Finance executives could easily misinterpret the revenue numbers reported by marketing because they are essentially speaking two different languages. Just because two divisions use the same word does not mean that they are referring to the same concept. Trying to roll up the finance revenue and the marketing revenue into an enterprise-level revenue figure will result in summarizing apples and oranges. The result will be incorrect.

Each Business System Has Its Own Unique Context and Language
Just as each line of business may have its own language, each system also has its own language and context. The system in the credit department defines customer as someone who has had their credit approved. The system in the marketing division defines a customer as anyone who can buy the company’s products. Sales defines a customer as someone who has been called upon by a company sales rep.

How Definition of Customer Affects KPIs
Everyone would agree that how a customer is determined would affect many metrics that influence KPIs (key performance indicators). For example:

  • Top ten percent of customers based on revenue
  • Percent of retained customers
  • Percent of repeat customers
  • Percent of customers that have purchased products from more than one line of business

Let’s suppose that the company determines the number of customers by counting the number of rows in the sales customer database. Let’s also understand that this database contains duplicate entries as well as prospects (potential customers who haven’t bought yet). All percentage metrics would be incorrect and lower than they really are.

This situation is further compounded when integration issues are considered. What happens when all the databases containing anything about customers get merged into one? The problem is much, much worse.

What is a Customer?
Because how customer is defined can affect so many KPIs, it is important to determine exactly what a customer is. For example, consider the following questions:

  • Is a customer an individual or an organization? Could an individual be a customer himself and also represent a larger organization? Would this situation involve two separate customers?
  • Is anyone who has purchased anything a customer? Is there a minimum order amount?
  • If someone has placed an order but the transaction is not yet complete, is the party placing the order considered a customer?
  • What if the party purchased something more than ten years ago? Are they still considered a customer?

More Metric Headaches
There once was an advertising company that wanted to display the location of their advertisers on an Internet map along with the advertisement so that if you wanted to visit a business, you could find its location easily. The company needed to know how many customers had incomplete addresses, which would result in maps not being able to be displayed for these customers. In other words: How complete was their data?

It turned out to be extremely difficult to identify their paying customers. They had just migrated to a new customer relationship management (CRM) package, and each legacy system had stored data on customers differently. As expected, prospects were mingled in with customers. There was a prospect code in the system that looked promising, but it was used inconsistently and was therefore suspect. One way to count a customer might be to count all parties who had an advertisement attached. However, sometimes the company would have a promotion and give away free ads, so counting ads tied to the customer was not adequate; ads with a dollar amount of zero would have to be eliminated from the count. There were duplication issues to contend with as well; customers often appeared in the database multiple times. After you successfully de-dupe the customer list, there are additional semantic issues surrounding what defines a unique customer. For example: What about companies and their subsidiaries? Are subsidiaries separate customers?

Therefore, the baseline metric used to determine how many customers they had was elusive and difficult to determine. These types of problems have an impact on rollups in several ways:

  • First, they impact totals which determine percentages, as previously described.
  • Secondly, they distinguish the categories that determine how the data is grouped, and these groups are then rolled up for totals.

Problematic rollups, therefore, can be explained as follows: Faulty rollups are semantically diverse items (apples and oranges) improperly classified into the same larger category and summarized in error. This can be visualized in a classic business intelligence tool in the drill-down/rollup fashion of a cube or cross-tab report, or it may be incorporated into percentages (as seen in the previous examples involving customers).

Faulty rollups can be very insidious, and they usually aren’t identified until they have done major damage. They are often caused by the “semantics that are buried in each system” problem that we discussed earlier.

Large enterprises are particularly prone to problems with accounting rollups. The definition of general revenue can vary from one division to another, especially where divisions have been acquired from other companies. Such problems are known to exist, even if they are rather embarrassing (or worse) in the current climate of Sarbanes-Oxley compliance requirements. However, the problems may not always strictly be the fault of an enterprise. After the Enron debacle, the government made decisions about what kind of assets could be on a company's balance sheet. For instance, certain loan obligations owned by a financial company had to be on the balance sheet, while others did not. Such accounting rules vary across jurisdictions, so foreign subsidiaries did not have to follow U.S. accounting rules. Consolidating balance sheets across a multinational financial enterprise is therefore tricky. How do you roll up figures that different regulatory authorities force you to account for in different ways?

What Does Revenue Mean?
“Revenue” without any modifier might be very confusing and the cause of faulty rollups. For example, does it refer to net revenue or gross revenue? Obviously, if different organizations within the company report on revenue differently, a rollup of total revenue will not be correct. If the underlying formula that was used is different in each division, then you don’t have a rollup, you have a mess. It is the apples and oranges problem again.

Units of Measure Differences
Another way that formulas can appear to be all apples but have oranges hiding in their midst is when you have different units of measure. This is a very insidious problem because all looks normal on the surface; and indeed, the problem may go undetected for a long period of time.

The classic example of this is the Mars orbiter problem back in 1999. Here are two quotes from CNN concerning this data screwup:

NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency's team used the more conventional metric system for a key spacecraft operation.

The navigation mishap pushed the spacecraft dangerously close to the planet’s atmosphere where it presumably burned and broke into pieces, killing the mission on a day when engineers had expected to celebrate the craft’s entry into Mars’ orbit.

In systems integration, care must be taken to harmonize units of measure. A common problem is that each system will use the unit of measure with which the users are most comfortable, and the system will do so without an awareness of other systems or divisions within the company who might also need the data. Sometimes the unit of measure is assumed and is not even explicitly stated in the system. This is where an understanding of each system’s unique semantics is so important to the creation of data warehouses.

I have encountered this problem in several international companies that stored data in many different currencies. The raw data could not be rolled up because it was not converted into one universal currency. The first attempt at cubes contained these faulty rollups. We had to correct this right away. The obvious cure for this is to standardize on one currency and convert everything to the standard. Then you can roll up easily, because you are comparing apples to apples.

Currency Exchange Rate Subtleties
A fascinating but subtle problem exists in exchange rates for currencies. Currency exchange rates vary; they are time-sensitive and also source-driven. Which exchange rate is to be used? When is the conversion made?

For example, a customer purchases office equipment with a credit card from a country other than the company of purchase. The customer then wishes to return the item a few days later. Which exchange rate is used? Is the entire purchase price, including the exchange rate in effect when the purchase was made, refunded – or is the exchange rate used for the refund transaction the one in effect the day of the transaction? How does the purchasing company account for this extra cost? Is it carried along with the transaction, or does it go in some “general” cost bucket? When these two transactions are rolled up, they undoubtedly will not entirely cancel each other out.

In a data warehouse that contains different source systems with different currencies, when will you perform the conversion? Will you do so when the transaction takes place or when the data is moved into the data warehouse?

Problems With Return on Investment (ROI)
Return on investment (ROI) presents a difficult problem in rollups, because often one division will conduct an ROI calculation when it wants to make a purchase or implement a new project and will likely only consider the project from its own perspective, not considering the impacts in other divisions or enterprise-wide. The division initiating the ROI calculation may fail to see costs or benefits in other areas of the company. For example, if Division A produces an ROI calculation for a new system, they may fail to see that other divisions further down the supply chain may require service level agreements (SLAs) and will require interfaces or services to be written to share the data stored in the new system. These are additional costs that need to be factored in. Therefore, a division-level ROI calculation is not necessarily applicable to the enterprise level.

Classification of Costs
Systems are very good at capturing costs, but they are not very good at classifying them; the classification rules must be specified by a human. Faulty or sloppy classification can affect the business comparison of apples to apples and can easily turn an apple into an orange.

For example, consider the problem of returned goods at an electronic components store. The company buys the merchandise at wholesale price $X; the store buys it at 2X and sells it to the consumer at 4X. What happens when the consumer returns the merchandise? There may be fees associated with the return, such as shipping or restocking. Where are these fees allocated or classified? They may be applied to a general account and not associated with the merchandise like they should be. Consequently, these costs may be hidden when the company is comparing the profitability of Widget A to Widget B. Suppose Widget A has a high defect rate and is frequently returned. If the return costs are not associated with Widget A, then the company has no way of knowing that Widget A is actually less profitable than Widget B. Accurate rollup within the category of Widget A will not be possible because the return costs were misclassified.

The Semantic Problem: Apples Disguised as Oranges
Suppose your company sells software and services. Both can be considered “products;” however, there may be different business rules applied to each that mandate their separation.

Suppose the company runs a special promotion that bundles software with services. How is this revenue accounted for? Which bucket does the revenue belong in: software or services? What often happens is different regions may account for the sales differently: one region will group both together in only one category and another may separate them out. Alternatively, another division may subtract the original cost of the software from the total and allocate the rest to services, and yet another division might just split it 50/50. Then, when drilldowns occur, they will be incorrect. The total revenue may be correct, but the category drilldowns will not be comparable.

Regardless of which way it happens, the lack of business rules guiding allocation of revenue will result in the categories being incorrectly determined. There are often accounting rules dictating that software revenue should be treated differently than services revenue. The lack of stringent business rules on category distinctions could cause compliance issues for the company.

This same kind of problem can occur in very subtle ways. For example, suppose a new software application is purchased. The technician comes out to install the software and discovers multiple hardware issues that need to be fixed before the software can be installed, such as a memory upgrade, a patch to the operating system, etc. Because the technician is on site, he or she handles the hardware issues. Is the time of the technician charged to the hardware or software classification?

This is an interesting situation because it is the faulty rollup problem in reverse: the rollup number may be correct, but the drilldowns are faulty.

The Moral of the Story
This article can be summarized as follows: definitions of key business concepts and KPIs are a critical success factor in any data warehouse. We have seen that classifications are especially tricky, and if rules concerning how to classify things are not specified well, all sorts of business problems can result. Care must be taken to ensure that data unification and harmonization (I have nicknamed this DUH) is conducted and verified with the business. If there are fuzzy concepts and areas where category distinctions or definitions are not clear, the business must work with the data analyst to achieve clarity. This may be a painful process because humans are often very sloppy; we don’t like to define things precisely because it takes work. Most of us don’t think in precise terms because we usually can get by without doing so. However, the success of the data warehouse – and sometimes the business itself – depends upon it.

  • Bonnie O'Neil 

    Bonnie is President of Westridge Consulting, and is an internationally recognized expert on data warehousing and business rules. She is a regular speaker at Meta Data/DAMA Conference, Software Development, Database World, Guide, and the Business Rules Forum; she was the keynote speaker at an international conference on Data Quality in South Africa.  She is a founding member of the Guide Business Rules Group (a standards group for business rules) and also the ODTUG Business Rules Summit. She has been involved in data warehousing projects in both Fortune 500 companies and government agencies, and her expertise includes specialized skills such as data quality, profiling, data integration and migration.  She is the author of two database books includingOracle Data Warehousing Unleashed, as well as over 40 articles and technical white papers. She is a Certified GIF Architect by Bill Inmon, the father of data warehousing.


Dig Deeper on Data warehouse software

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Bonnie, this is Ron Phillips 863 594-4410 I am writting another book. Behavioral Data Encapsulation using Sprocs, constraints, and triggers. Much deeper than that. Give me a call if you are interested. RP