Tag: KPI

Analyzing the number of visits per customer

Hilmar Buchta

SQL Server 2008 | SQL Server 2008 R2

OLAP is perfect for analyzing fact records that are mapped to dimensions. However, what can be done, if the reference to the dimension changes depending on the selected period? I’m not talking about slowly changing dimensions here, but really about the period of selection.

Think of the following example: A manager wants to analyze the number of visited customers for his sales force. He wants to see, how many customers have been visited once, twice and so on during the last three months (or any other time period).

In order to do so, we need a dimension “Visits” (with the number of visits, for example 0, 1, 2, …) but we cannot map the visits against this dimension because this mapping depends on the selected period. For example, one single customer was visited once in January and once in February. If the selected period is January, the we need to see one customer being visited once. If the selected period is January and February together, we would need to see the customer being visited twice.

In SQL Server 2008 we can solve this by not linking our Visits dimension to the measure groups. The calculations are then performed by a cube script. For our example, we’re using a very simple data model:

image

The fact table FactVisits contains one row per visit (linked to date and customer). Note that the Visits dimension is not linked to the fact table. The source table for the visit dimension looks like this.

image

For the cube, we need to calculate the entries for the Visits dimension. To do so, I created an additional measure (Customer Count) based on some other value I found in the source fact table (the value will be overwritten using the cube script later). Usually you would create an additional column in the data source view (value 0) to source this measure from. Here is the definition of my Customer Count measure:

image

In order to calculate the number of customers for each visit count, I created the following cube script:

scope ([Visits].[Visit].[Visit],[Customer].[Customer].[Customer],[Measures].[Customer Count]);
this=iif(([Measures].[Visit Count])=CInt([Visits].[Visit].currentmember.properties("KEY")),1,NULL);
end scope;

The script is computed on leaf level of the visits (visit count dimension). In the cube script, the reference to [Visits].[Visit].currentmember.properties("KEY") results in the VisitID (number of visits). If the number of visits (measure [Visit Count]) is equal to the number of visits found in the visit dimension, this is counted as one. Note, that I used NULL instead of 0. This makes it easier to analyze, which customers have been visited for a given number of times, as null values can be suppressed (see last screenshot of this post).

So, let’s check the results up to this point.

image

The calculation now works for every single line but currently the totals are still wrong. The correct value for the totals can be computed using a dynamic set (to allow Excel multi selects) like this:

CREATE HIDDEN DYNAMIC SET CURRENTCUBE.[DynaVisits] AS [Visits].[Visit].[Visit];
([Visits].[Visit].[All])=Sum(existing [DynaVisits]);

I also want to count more than ten visits on a special dimension element (ID 999 in the table above). This can be achieved with the following cube script:

scope ([Visits].[Visit].&[999],[Customer].[Customer].[Customer],[Measures].[Customer Count]);
this=iif(([Measures].[Visit Count])>10,1,NULL);
end scope;

Testing the results (here using Microsoft Excel) shows the correct calculation for the grand total as well as for other levels of the visit count hierarchy:

image

The way our calculations works can be best observed when looking at a single customer.

image

In this case we had two visits of our customer in 2008 and one visit in 2009 giving a total of three visits for 2008 and 2009. Note that the actual fact is shown as a dimensional property here.

We can also apply multi select filters in Excel. Here is another sample screenshot filtering only customers with less than 4 visits during the last three months in 2009 (this is the ‘(multiple items)’ filter for the calendar). The data is now also analyzed by the sales representative (attribute of the customer). Note, that the grand totals are still correct:

image

As mentioned before it is also possible to see which customers are listed here by including the customers in the pivot table. This is shown in the following screenshot (filter to visit count = Zero).

image

However, the drill through action would still return all customers (due to the definition of my measure) as the value is only based on a computation. So the drill through should be disabled here.

In this case, I simply checked the measure in order to enable the drill through action:

image

Of course, the example is still a very simplified one. Usually you would need to know the customer base, so you’re not counting new customers as not being visited during the last years.

  • Share/Bookmark

Simple Banding function for KPI Status or KPI Trend

Hilmar Buchta

SQL Server 2008

Although the way of defining the KPI status and trend is very flexible it is also somehow circumstantial if you simply want to rate a KPI by its target value. In this case you would have to calculate the status of the KPI as being between –1 (worse) and +1 (best).

In many cases we find simple case statements here:

case
when KPIVALUE("TestKPI") >= 0.95 * KPIGOAL("TestKPI") then 1
when KPIVALUE("TestKPI") < 0.7 * KPIGOAL("TestKPI") then -1
else 0
end

This results in the KPI status being –1, 0 or +1. But if you want to blend smoothly between these values you have to use a formula that is a little bit more complex. Especially  if you need to this computation for many KPIs it’s not really nice.

I had some examples for AS stored procedures (sprocs) in this blog before. So here is a very simple one to calculate a linear approximation between the boundaries:

public static double KPIStatusBanding(double bound1, double bound2, double actual)
{
    double factor=0;

    if (bound1 < bound2) factor = 1;
    else if (bound1 > bound2) factor = -1;
    else return 0;

    if (actual < bound1) return -factor;
    else if (actual > bound2) return factor;
    else return factor*2 * (actual – bound1) / (bound2 – bound1) – 1;
}

The parameters are as follows:

bound1 worst value
bound2 best value
actual actual value

 

image

If bound1 < bound2 bigger values are better, if bound1 > bound2, lower values are better. With this function the calculation is quite easy. First let’s test the function itself with simple MDX queries:

with
member test as ASStatistics!KPIStatusBanding(100,200,175)
select test on 0
from [Adventure Works]

This results in a status value of 0.5.

In order to use the same boundaries as in my first example, I would use this KPI status expression:

ASStatistics!KPIStatusBanding(
0.7 * KPIGOAL("TestKPI"),
0.95 * KPIGOAL("TestKPI"),
KPIVALUE("TestKPI"))

image

As we can see from the KPI browser the values now gets approximated smoothly:

image

  • Share/Bookmark

More about context sensitive formatting

Hilmar Buchta

SQL Server 2005 | SQL Server 2008

In my last post I wrote about context sensitive formatting. The goal was to format values differently depending on their digits, so that for example. 532 formats to 532 but 143,134 formats to 143K.

When you try the formatting from my post with Excel you’ll find that Excel has some problems interpreting the format string. The reason is that Excel simply gets confused by the characters (e.g. M). So you might better want to escape those characters in the format-string.

Furthermore you could also think of changing the color dynamically too. Maybe you want to format negative values in red.

Here is the completed cube script statement to create the measure in an Excel compliant way:

Create Member CurrentCube.[Measures].[Sales Amount Fmt]
AS  [Measures].[Sales Amount],
  FORMAT_STRING =
    iif(vba!abs([Measures].[Sales Amount])<1000,"\$0",
    iif(vba!abs([Measures].[Sales Amount])<1000000,"\$0,\K",
     "\$0,,\M"
    )),
  FORE_COLOR=iif([Measures].[Sales Amount]<0,RGB(255,0,0),RGB(0,0,0)),
NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount] }
;

In Excel this may result in the following visualization (since all values are positive the FORE_COLOR has no effect in this example):

image

  • Share/Bookmark

Context sensitive dynamic formatting for measures

Hilmar Buchta

SQL Server 2005 | SQL Server 2008

It’s commonly known that you can provide a format string for measures like in the following example:

with MEMBER [Measures].[Sales Amount Fmt] AS  [Measures].[Sales Amount],
FORMAT_STRING = "$0,000",
NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount] }

select  [Measures].[Sales Amount Fmt] on 0,
[Product].[Subcategory].[Subcategory] on 1
from [Adventure Works]

The important part of the example above is the FORMAT_STRING property. Here, it formats our measure sales amount as a dollar value without positions after the decimal point and with a separator for thousands.

The result looks like this:

image

It should be noted that the locale (i.e. using the comma as separator character) is derived from certain system settings (see http://support.microsoft.com/kb/950598 for details). However, you may overwrite the local using the language property as shown below:

with MEMBER [Measures].[Sales Amount Fmt] AS  [Measures].[Sales Amount],
FORMAT_STRING = "$0,000",
LANGUAGE=1031,
NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount] }

select  [Measures].[Sales Amount Fmt] on 0,
[Product].[Subcategory].[Subcategory] on 1
from [Adventure Works]

The only difference compared to the query above is the LANGUAGE property. Being set to 1031 for Germany the result displays a point as separator character:

image

To get back to the topic of this post, it’s interesting that the FORMAT_STRING property doesn’t have to be a constant. You might play around with simple expressions like "$0,0"+"00" but you have full support for MDX expressions here.

Let’s assume you want to format your values differently depending on their size, you can easily do so with an expression. For example, let’s assume you want to display dollar values like this

573

as $573

12,319

as $12K

9,312,131

as $9M

This can be done with the following FORMAT_STRING property which can be used in your cube script or in your MDX query:

FORMAT_STRING =
iif([Measures].[Sales Amount]<1000,"$0",
iif([Measures].[Sales Amount]<1000000,"$0,K",
"$0,,M"
))

Using this expression for the FORMAT_STRING property, the result of the query above looks like this (I also kept the originally formatted value of the example above to make it easier to compare the results):

image

As you can see, the values in the example above show as Thousand or Million Dollars. I think that this looks a little bit confusing and I would prefer showing all values at the same scale but at least you can use this feature to set different formatting options.

Furthermore you could think of displaying the correct unit (like meters, square meters, kilogram) with the values depending on a dimension attribute or you could display the right currency code depending on the currency of the transaction. In the Adventure Works cube for example, you can find the following cube script, which sets the language property for the destination currency dimension:

/*– Set language property for the Destination Currency cube dimension –*/
Scope
(
   [Destination Currency].[Destination Currency Code].Members,
   [Destination Currency].[Destination Currency].[Destination Currency].Members

) ;

   Language( This ) =
   [Destination Currency].[Destination Currency].Properties( "Locale" ) ;

End Scope ;

In this case, the locale is stored as a dimension attribute (for example 1033 for US, see http://msdn.microsoft.com/en-us/library/0h88fahh(VS.85).aspx for an overview of the locale codes). The script results in the correct visualization on the currency dimension like shown in the following query:

select  [Measures].[Sales Amount] on 0,
[Destination Currency].[Destination Currency].[Destination Currency] on 1
from [Adventure Works]

image

  • Share/Bookmark

Formatting actual and goal for a KPI

Hilmar Buchta

SQL Server 2005 | SQL Server 2008

Being able to specify the preferred format for measures and calculated measures in SSAS is a great feature since you just do the definition once and clients can leverage this central definition.

But for a KPI expression (value, goal, status, trend) it’s not directly possible to specify the format. But there is still a simple trick to do so: You just have to define the calculation itself as a calculated member within your cube script like this:

create
member currentcube.Measures._BacktestingTotal as 100, visible=0

member currentcube.Measures._BacktestingValue as 5, visible=0 , format_string = "0.00%"

member currentcube.Measures._BacktestingTarget as 5, visible=0 , format_string = "0.00%"

I removed the real computation formulas for the three values and replaced them by constants because they would simply be confusing. But you can see how the format string is applied here. These members shouldn’t be visible, so I set visible=0.

In our KPI definition you can simply point the expression to the calculated member like this:

 

image

By doing so, the format of the calculated member defines the format for the KPI  value, target etc.

  • Share/Bookmark

Binomial distribution for a KPI status (part 1)

Hilmar Buchta

SQL Server 2005

This entry is not much about OLAP or MDX but gives some background on statistical tests that may be very useful in designing meaningful KPIs based on statistical hypothesis tests. In KPIs we usually compare actuals with goals and visualize the results as a traffic light. In most cases both the actual value and the target can be read from the cube. It is very common to have a dimension "Scenario" with members Actuals and Forecast or something like that.

But when dealing with statistical results, things are getting a little bit more difficult. For example, we’re testing the quality of products in a manufacturing line. If the quality test is expensive we have to rely on a spot sample. Let’s assume we want a quality of 99%. This means that only 1 of 100 produced items should fail the quality test. So for our spot test let’s say we pick 500 out of 10000 produced items of a charge and perform the quality check on them. Depending on the result of the test we decide if we are going to keep the remaining 9500 items or not.

In test theory we say our hypothesis (aka null hypothesis, H0) for a statistical hypothesis test is that the quality of our charge is 99%. The alternative hypothesis (H1) would be that our hypothesis is wrong, meaning the quality is below 99%. So the question is, based on the result of our spot sample, do we keep the hypothesis or not.  More precisely, our null hypothesis is, that the probability for a damaged items follows a binomial distribution with a single probability of p=1%. The binomial distribution function B(n,k,p) gives the probability that exactly k out of n events happen when the single probability for an event is p. In our example, B(500,5,1%) ~ 17.6% gives the probability that we observe exactly 5 damaged items in our 500 item spot sample, assuming that the probability for a damaged item is 1%.

Without going too deep into statistics, it’s important to know that there are two kinds of error we have to face:

  • type I error (aka alpha error or false positive)
  • type II error (aka beta error or false negative)

A type I error means that we discard the null hypothesis (the whole charge would be thrown away) although it actually meets our quality criteria while a type II error means that we don’t discard our hypotheses (we deliver the whole charge although it doesn’t meet the criteria). It’s impossible to minimize both kind of errors (see http://en.wikipedia.org/wiki/Type_I_and_type_II_errors for details) at the same time.

For our test let’s take a look at the probabilities for the binomial distribution function (we’ll cover the calculation of these probabilities in the next post but in the meanwhile you can use the Excel function BINOMDIST to compute these values):

 

number of items failing the test single probability cumulated probability
0 0.7% 0.7%
1 3.3% 4.0%
2 8.4% 12.3%
3 14.0% 26.4%
4 17.6% 44.0%
5 17.6% 61.6%
6 14.7% 76.3%
7 10.5% 86.8%
8 6.5% 93.3%
9 3.6% 96.9%
10 1.8% 98.7%
11 0.8% 99.5%
12 0.3% 99.8%
13 0.1% 99.9%
14 0.0% 100.0%

 

When plotting the probability we clearly see the peak at about 5 items which corresponds to our hypothesis that the probability of a damaged item is 1% as 1% of 500 items gives 5 items.

image

 

If we find no damaged items in our spot-test we can be pretty sure that our charge meets the quality criteria. And if we find 14 damaged items? Well, obviously the charge shouldn’t be delivered. But what do we do with maybe 8 damaged items? We have to define a threshold c. When X, the number of damaged items is less or equal than c we keep the charge, if X is greater than c we don’t keep the charge. This definition makes our test-procedure: "take 500 out of 10000 items for the test. If more than c fail the test, discard the whole charge". But how to find c?

Usually we would set a confidence level alpha for the type I error. Then we could compute the proper value of c that matches this confidence level (i.e. the smallest value for c for which the type I error is below our confidence level alpha). In mathematical terms this would read

image

If c is zero we only keep charges that have no damaged items in the spot test. Let’s compute the type I error in this case. We have to compute the probability that our hypothesis is abandoned although it is true. In mathematical terms this would read

image

The value can simply be taken from the table above as P(X>0|p=1%) = 1-P(X=0|p=1%) = 1-0.007=0.993

If we decide for an alpha of 5% we would choose c=9, meaning that we discard charges having more than 9 damaged items in the spot tests as this has a probability of about 3.1% (the sum of the single probability values from the table above for c=10, 11, 12, …) which is below 5%.

On the opposite, let’s compute the type II error for this case. Here we have to compute the probability that our hypothesis is kept although it isn’t true. Our hypothesis is wrong if the probability for a damage is not 1%. For the calculation we assume a higher probability, for example 2%. So in mathematical terms this would read

image

The complete table for the type I and type II error probabilities looks like this:

c alpha beta (2%)
0 99.3% 0.0%
1 96.0% 0.0%
2 87.7% 0.3%
3 73.6% 1.0%
4 56.0% 2.8%
5 38.4% 6.5%
6 23.7% 12.8%
7 13.2% 21.7%
8 6.7% 33.1%
9 3.1% 45.7%
10 1.3% 58.3%
11 0.5% 69.8%
12 0.2% 79.3%
13 0.1% 86.7%
14 0.0% 91.9%

 

So, if we set c=0 (very strict) it is very likely that we discard charges that are ok and very unlikely that we don’t discard a charge although it is damaged. On the other hand, if we set c=14 it’s very unlikely that we discard charges that are ok and very likely that we don’t discard charges that are damaged.

Of course, test theory is much more complex. Besides alpha and beta you can also consider the so called power (1-beta) in order to decide if your test setup is significant. The power gives the probability for a statistical test, to decide for the alternative hypothesis in case the alternative hypothesis is correct. In general terms keeping the hypothesis while the power is low (i.e. the type II error probability is high) doesn’t make much sense. Alpha may also be considered as the risk of the manufacturer while beta may be considered as the risk of the buyer (in case there is no warranty). The proper choose of alpha and beta determines much of the quality of the test. For example, Cohan suggests beta being less or equal to four times alpha (α≤4β, see Cohan, J. 1969, "Statistical Power Analysis for the Behavior Science") for medicial hypothesis tests. So if alpha is 5%, beta should be less or equal to 20% (power greater or equal to 80%). Of course you can increase the significance of your test (and reduce the error probability) by taking a bigger spot sample but on the other hand this is usually more expensive, so you have to find an optimal compromise between costs and significance of the test.

Now, finally back to our well known KPIs. Instead of using a test setup with only one threshold we could use a KPI with three different areas, for example like this:

green probability of accepting a damaged charge is low (type II error)
red probability of not accepting a correct charge is low (type I error)
yellow hard to decide area in between

 

This might lead to the following KPI color definition:

color condition values
green if beta is below 20% 0 … 6
red if alpha is below 5% 9 … 500
yellow everything else 7 .. 8

Using this approach resembles the conflict of minimizing both errors in a good way.

Although the probabilities can be computed using the Excel function BINOMDIST, I didn’t manage to get this called from MDX (you can call many, but not all Excel functions from within MDX using the Excel!functionname-syntax). So the next post will be more about how to compute this distribution function in MDX using an SSAS stored procedure.

  • Share/Bookmark

  • Kategorien

  • Copyright © ORAYLIS Blog. All rights reserved.