Hilmar Buchta

SQL Server 2005

In the last post we discussed some background on statistics. Now let’s turn this into a KPI target. If the parameters of the test are well known in advance the limits for our KPI can be computed for example using Excel formulas.

But if the parameters are based on user input, the computation of the thresholds has to be done in our OLAP engine (MDX). For example, in risk management we compute the Value-at-Risk (VaR) for each day. The VaR is a measure giving us the maximum loss with a certain confidence level. If our confidence level is e.g. 5% the VaR gives our maximum loss for 95 out of 100 days. In the remaining 5 days we expect our loss to be higher (so the VaR is actually a quantile of the probability function of our win/loss). The computation of the VaR requires some kind of model and during the backtesting process we make sure that the model is working properly. So our hypothesis is that the model is correct while the alternative hypthesis is that the model is wrong. Our model is correct if the VaR computed based on the model meets the really observed losses. We count the days in which the actual loss was higher than predicted by the VaR. If it’s 5% we can be say, that our model is correct. If not, it may be that the model is incorrect or the sample isn’t representative.

If the number of days (maybe the period) is a parameter (maybe a report parameter or a value taken from another dimension), we have to do the computation in the OLAP model.

For this, we have to start with the binomial distribution given by

image

where n is the total number of cases (days for which we’re doing the backtesting), k is the number of observed losses higher than the predicted loss, p is the probability for such a higher loss (in our example 5%), q=1-p (the confidence level of our VaR calculation) and

image

 

For example for k=0 we get

image

As we’ve seen in the last post, we need the aggregated probability which looks like this:

image

Now we can define our KPI status for example as

image

 

The calculation of the above function is a little tricky and cannot easily be done in MDX. We could call the Excel-Function BINOMDIST, but I didn’t get it to work (not all Excel functions are also available for MDX). So here is an approach using an SSAS stored procedure.

Instead of computing the sum over the binomials we actually compute the incomplete beta integral which works much better than dealing with the large numbers resulting from the factorials. There is an excellent set of statistic function available at http://www.alglib.net for free and I used the code from there (see http://www.alglib.net/specialfunctions/distributions/binomial.php to download the code).

An SSAS stored procedure doesn’t have much requirements. In its minimal form, it consists of a single static class with one or more public static functions. The code has to be compiled to a class library (Assembly DLL-File) which can then be referenced from SSAS (by simply adding it to the assemblies using Management Studio). There are many code samples available at Codeplex: http://www.codeplex.com/ASStoredProcedures

In our case, the code for the class (C#) simply looks like this:

 

using System;

namespace ASStatistics
{
    public class Statistic
    {
        public static double BinomialDistribution(int k, int n, double p) {
            return binomialdistr.binomialdistribution(k, n, p);
            }

    }
}

 

The function call is taken from the AlgLib library above. The MDX code for calling the library function (once the Assembly is registered in SSAS) may look like this

WITH
  MEMBER bindist AS
    ASStatistics.BinomialDistribution(5, 500, 0.01)
SELECT
  bindist ON 0
FROM [Adventure Works]

 

This gives the following result:

image

 

So, if the number of cases if fixed, say 500, the KPI status expression for our Backtesting KPI may look like this:

case
when ASStatistics.BinomialDistribution(KPIVALUE("Backtesting"),500,0.01) <= 0.95 then 1
when ASStatistics.BinomialDistribution(KPIVALUE("Backtesting"),500,0.01) <= 0.9999 then 0
else -1
end
 

The above confidence levels of 95% and 99.99% correspond to the Basel II recommendation for backtesting of the VaR.