Hilmar Buchta

SQL Server 2008 | SQL Server 2008 R2

This post is about reproducing some of the mining charts that are built into Business Intelligence Development Studio using Reporting Services. A common use case might be that you periodically train your model and the results of the training should be published as html reports.

For my example I use the Adventure Works Targeted Mailing mining structure. We want to plot the results from the “TM Decision Tree Model”. First let’s start with the lift chart. In BIDS, the chart for “Bike Buyer=1” looks like this:

image

In order to reproduce the chart, we’re going to use the SystemGetLiftTable stored function in SSAS. Since we need to do some computations, I prefer to use SQL, so we have to define a linked server first:

EXEC sp_addlinkedserver
@server=’SSAS_AW‘,
@srvproduct=“,
@provider=’MSOLAP‘,
@datasrc=’Adventure Works DW 2008R2′

The query for loading the results from the lift chart looks like this:

SELECT percentile,
       VALUE,
       CASE
         WHEN percentile >= 100 * CONVERT(FLOAT, totalattributevaluecases) /
                            totalcases
       THEN 100
         ELSE percentile / (( CONVERT(FLOAT, totalattributevaluecases) /
                              totalcases ))
       END idealmodel
FROM   Openquery(ssas_aw,
‚CALL SystemGetLiftTable([TM Decision Tree], 2, “Bike Buyer“, 1)‘) AS
derivedtbl_1
 

For the chart, we use the percentile of the lift table as category group (x-axis) and choose these entries for the values (y-axis):

  • Sum(Value)
    This is the actual lift curve for our mining model
  • Sum(Percentile)
    As this is the same value as on the x-axis, this gives the random guess model line (linear)
  • Sum(IdealModel)
    The case-when-end statement above reflects the curve for the ideal model (linear function from 0 to 1 between 0 and totalattribute/totalcases)

This is how the chart looks like in the SSRS designer:

image

There are different names for this kind of a chart. For this post I’m using the naming conventions as shown here: http://www2.cs.uregina.ca/~hamilton/courses/831/notes/lift_chart/lift_chart.html, so the chart is labeled “Cumulated Gains Chart” instead of lift chart (BIDS).

So, here is the resulting Cumulated Gains Chart from my SSRS report:

image

As you can see, the chart looks pretty much the same as the chart that was displayed in BIDS.

Another useful chart is the actual lift factor of the model, often referred to as the ‘lift chart’. For this chart, we use the same SSAS function but we need to compute the lift in our query: 

SELECT percentile,
       VALUE / percentile AS lift,
       1                  AS randomguess
FROM   Openquery(ssas_aw,
‚CALL SystemGetLiftTable([TM Decision Tree], 2, “Bike Buyer“, 1)‘) AS
derivedtbl_1
 

This is how our chart looks like:

 image

Finally, we also want to reproduce the classification matrix (often referred to as confusion matrix). This is how the matrix looks like in BIDS:

image

To get these results, we use the function SystemGetClassificationMatrix as shown below:

SELECT predictedvalue,
       SUM(CASE actualvalue
             WHEN 0 THEN [COUNT]
             ELSE 0
           END) actual_0,
       SUM(CASE actualvalue
             WHEN 1 THEN [COUNT]
             ELSE 0
           END) actual_1
FROM   Openquery(ssas_aw,
‚CALL SystemGetClassificationMatrix( [TM Decision Tree], 2, “Bike Buyer“)‘
) AS derivedtbl_1
GROUP  BY predictedvalue
 

The second parameter of the function SystemGetClassificationMatrix means
1 – training data
2 – test data
3 – both (training and test data)

I used a simple matrix on my report to display the result from the query above and applied some coloring in order to distinguish error cases from correct cases:

image

To complete this post, I’d like to point out that you can also query your model cases directly, using a query like

SELECT *, IsTestCase() As TestCase FROM [TM Decision Tree].CASES

This query returns all the cases from the TM Decision Tree model with an additional column “TestCase” that contains true, of the case belongs to the test data set. In order to run this query you need to enable the drill-through option for the mining model. You can find out more about the SSAS mining functions in this post: http://www.bogdancrivat.net/dm/archives/14