|
|
||||||||||||||||||||
| Home > Data Management All-in-One Guides > Business intelligence > Business intelligence and related technologies > Business intelligence and text mining > What is a data rollup? | |
| All-in-One Guides: Business intelligence: |
|
|||||||
|
||||||||
Business intelligence and related technologies
![]() Business intelligence and text mining
|
||
Table of contents:
2.4 Data Rollup The simplest definition of data rollup is that we convert categories to variables. Let us consider an illustrative example. Table 2.1 shows some records from the transaction table of a bank where deposits are denoted by positive amounts and withdrawals are shown as negative amounts. We further assume that we are building the mining view as a customer view. Because the first requirement is to have one, and only one, row per customer, we create a new view such that each unique customer ID appears in one and only one row. To roll up the multiple records on the customer level, we create a set of new variables to represent the combination of the account type and the month of the transaction. This is illustrated in Figure 2.1. The result of the rollup is shown in Table 2.2.
Table 2.1 A Sample of Banking Transactions
Table 2.1 shows that we managed to aggregate the values of the transactions in the different accounts and months into new variables. The only issue is what to do when we have more than one transaction per account per month. In this case, which is the more realistic one, we have to summarize the data in some form. For example, we can calculate the sum of the transactions values, or their average, or even create a new set of variables giving the count of such transactions for each month – account type combination.
Figure 2.1 Data rollup
Table 2.2 Result of Rolling up the Data of Table 2.1
In the end, even with careful selection of the categories and resolution of combining the different categories to form new variables, we usually end up with a relatively large number of variables, which most implementations of data mining algorithms cannot handle adequately. However, we should not worry too much about this problem for the moment because data reduction is a basic step in our planned approach. In later chapters, we will investigate techniques to reduce the number of variables. In the last example demonstrating the rollup process, we performed the rollup on the level of two variables: the account type and the transaction month. This is usually called multilevel rollup. On the other hand, if we had had only one type of account, say only savings, then we could have performed a simpler rollup using only the transaction month as the summation variable. This type of rollup is called simple rollup. In fact, multilevel rollup is only an aggregation of several simple rollups on the row level, which is the customer ID in our example. Therefore, data preparation procedures, in either SAS or SQL, can use this property to simplify the implementation by performing several simple rollups for each combination of the summarization variables and combining them. This is the approach we will adopt in developing our macro to demonstrate the rollup of our sample dataset. Now let us describe how to perform the rollup operation using SAS. We will do this using our simple example first and then generalize the code using macros to facilitate its use with other datasets. We stress again that in writing the code we preferred to keep the code simple and readable at the occasional price of efficiency of execution, and the use of memory resources. You are welcome to modify the code to make it more efficient or general as required. We use Table 2.1 to create the dataset as follows:
Data Transaction;
; run; The next step is to create the month field using the SAS Month function:
data Trans; Then we accumulate the transactions into a new field to represent the balance in each account:
proc sort data = Trans; Finally, we use PROC TRANSPOSE to roll up the data in each account type and merge the two resulting datasets into the final file:
/ * Prepare for the transpose * /
proc transpose data = Trans2 out = rolled C prefi x = C ;
proc transpose data = Trans2 out = rolled S prefi x = S ;
data Rollup; To pack this procedure in a general macro using the combination of two variables, one for transaction categories and one for time, we simply replace the Month variable with a TimeVar, the customer ID with IDVar, and the AccountType with TypeVar. We also specify the number of characters to be used from the category variable to prefix the time values. Finally, we replace the two repeated TRANSPOSE code segments with a %do loop that iterates over the categories of the TypeVar (which requires extracting these categories and counting them). The following steps detail the resulting macro.
Figure 2.3 Parameters of TBRollup() Macro
Step 1 proc sort data = & TDS; by & IDVar & TimeVar & TypeVar; run;
Step 2
Step 3
Step 4
Step 5
data & RDS;
Step 6
proc datasets library = work nodetails; We can now call this macro to roll up the previous example Transaction dataset using the following code:
data Trans;
The result of this call is shown in Table 2.4.
Table 2.4 Result of Rollup Macro
More on data mining:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| About Us | Contact Us | For Advertisers | For Business Partners | Site Index | RSS |
|
|
|
|||||||