|
|
||||||||||||||||||||
| Home > Data Management All-in-One Guides > Business intelligence > Business intelligence and related technologies > Business intelligence and text mining > Calculating mode in data mining projects | |
| All-in-One Guides: Business intelligence: |
|
|||||||
|
||||||||
Business intelligence and related technologies
![]() Business intelligence and text mining
|
||
Table of contents:
2.5 Rollup with Sums, Averages, and Counts In addition to finding the sum of a value variable during the rollup, it may also be more meaningful sometimes to calculate average value or the number of records that represent certain events — for example, number of deposits, number of withdrawals, or number of mailings a customer received responding to an offer. In our rollup macro, these requirements would alter only the middle part of our code, where we calculated the cumulative value of the Value variable. The following code segment would modify the macro to calculate the average value and the number of transactions for each account type instead of the total:
Step 2 Furthermore, the code inside the %do loop should also reflect our interest in transposing the values of the average variable, _AVG. Therefore, the code will be as follows:
Step 4 The complete code for the modified code to roll up the average value is included in the macro ABRollup() .
Another useful summary statistic is the mode, which is used in both the rollup stage and the event-driven architecture (EDA). The mode is the most common category of transaction. The mode for nominal variables is equivalent to the use of the average or the sum for the continuous case. For example, when customers use different payment methods, it may be beneficial to identify the payment method most frequently used by each customer. The computation of the mode on the mining view entity level from a transaction dataset is a demanding task because we need to search for the frequencies of the different categories for each unique value of the entity variable. The macro shown in Table 2.5 is based on a classic SQL query for finding the mode on the entity level from a transaction table. The variable being searched is XVar, and the entity level is identified through the unique value of the variable IDVar:
%macro VarMode(TransDS, XVar, IDVar, OutDS);
Table 2.5 Parameters of VarMode ( ) Macro
SELECT & IDVar , MIN( & XVar ) AS mode The query works by calculating a list holding the frequency of the XVar categories, identified as CNT, then using the maximum of these counts as the mode. The query then creates a new table containing IDVar and XVar where the XVar category frequency is equal to the maximum count, that is, the mode. The preceding compound SELECT statement is computationally demanding because of the use of several layers of GROUP BY and HAVING clauses. Indexing should always be considered when dealing with large datasets. Sometimes it is even necessary to partition the transaction dataset into smaller datasets before applying such a query to overcome memory limitations. More on data mining:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| About Us | Contact Us | For Advertisers | For Business Partners | Site Index | RSS |
|
|
|
|||||||