OLAP Miner User's Guide


The OLAP Miner algorithm

OLAP Miner uses a sophisticated statistics-based multidimensional algorithm that scans all the cell values in a specified area (subcube) of an OLAP cube for deviations. A value is considered a deviation if it is significantly different from the expected value. The OLAP Miner algorithm calculates the expected value of a cell based on its position in the cube, and the algorithm combines contributions from all dimensions to which the cell belongs. OLAP Miner lists deviations, presents various views of each deviation, and quantifies the magnitude of each deviation with respect to other deviations in the same cube.

The following simplified example shows how OLAP Miner finds deviations in OLAP cubes.

Member Only or Member and Descendants selections

This example uses a four-dimensional OLAP cube as shown in Figure 15. The goal is to find deviations in the sales data from that cube. The cube contains the following dimensions: Accounts, Year, Market, and Product. The Sales data (from the Accounts dimension) will be searched for deviations in the context of the other three dimensions.

To find deviations, you first create a deviation detection definition. To create a deviation detection definition, select a member from each dimension and specify it as Member Only or Member and Descendants. These selections specify how far down in the hierarchy of the dimension OLAP Miner will search for deviations.

Figure 15. Specifying each member in the cube as Member Only or Member and Descendants to create a deviation detection definition

Specifying each member in the cube as Member Only or Member and Descendants to create a deviation detection definition

For this analysis, you select only the Sales member as Member Only from the Accounts dimension. From the Year, Market, and Product dimensions, you select all first-generation members as Member and Descendants. The target member, in this example Sales, must be selected as Member Only so that OLAP Miner does not calculate different units of measure. Also, the sales figures will be separated from cost and profit figures. See Before you create deviation detection definitions for information about selecting members for deviation detection definitions.

If you change a subcube even slightly, you might see more or fewer deviations and possibly very different magnitude values. The OLAP Miner algorithm results depend on which dimensions you select and how many levels or generations those dimensions include. You might see very different results (deviations and magnitude) simply by selecting more or fewer members of a dimension.

The subcube

After you specify all members in the cube as either Member Only or Member and Descendants, OLAP Miner collects the data to form the subcube. The subcube contains sales data referenced by the members from Year, Market, and Product. Specifying members as either Member Only or Member and Descendants creates the deviation detection definition. Collecting the data together creates a subcube. A deviation detection definition is similar to an OLAP outline, and a subcube is similar to a small database. Although the subcube is similar to a small database, it is not created as a DB2 OLAP Server cube (database). Rather, the data is processed internally by the OLAP Miner server.

By specifying a member from the Accounts dimension as Member Only, you target that member to be mined for deviations. Remember that a value is deviant only in comparison to the values that surround it in the subcube. As shown in Figure 16, OLAP Miner specifically analyzes Sales in the context of the other dimensions (Year, Market, and Product).

Figure 16. Creating a subcube from four-dimensional source cube

Creating a subcube from a four-dimensional cube

The expected value of a cell in the subcube

To determine if a cell value is a deviation, OLAP Miner must compute the expected value for every cell in the subcube. The expected value for each cell is compared to the actual value of each cell to determine if the cell value is a deviation and to what extent it is deviant. Figure 17 shows how one expected value for cell y is calculated. Each view represents the intersection of cell y with the other dimensions in the subcube, and each view contributes to the expected value for cell y.

Figure 17. OLAP Miner computing the expected value for a single cell y

OLAP Miner computes the expected value for a single cell y

The magnitude of the deviation

In simple terms, OLAP Miner scales the absolute difference between the actual cell value and the expected cell value to show the magnitude of the deviation. Higher magnitude values show more significant, or important, deviations. OLAP Miner uses a scaled system so that it can find deviations more accurately in extreme ranges of values, including negative and positive value ranges.

More specifically, after the expected value is calculated, the absolute difference between the actual and the expected value is calculated. This absolute difference is called the residual. OLAP Miner considers any cell in which the residual is relatively large to be a deviation. However, a statistically valid definition of relatively large is required, so the residual value is divided by the associated standard deviation for that cell.

For example:

Residual = |Expected cell value - Actual cell value|
Associated Standard Deviation is
	R12 + R22 + R32... (R = residuals for all selected cells in the subcube) = X
Associated Standard Deviation = &sqrt.X/number of cells in the subcube
Standardized Residual = Residual/Associated Standard Deviation 

Because OLAP cubes can contain extreme ranges of values, including negative and positive values, the OLAP Miner algorithm standardizes the residual value. A standardized residual is required because mere residual values can be misleading and might not be useful for analysis. Also, a standardized residual can accommodate extreme values without allowing those values to skew the other values in the subcube. This process of standardization acts as a normalizer. If this calculated standardized residual is higher than a defined threshold for that cube, the value of the cell is considered a deviation.

OLAP Miner does not use a single absolute scale to measure magnitude. Magnitude is scaled for each subcube individually. Therefore, you cannot compare the magnitude values of one subcube to another subcube.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]