SQL Server 2005  SQL Server 2008  SQL Server 2008R2
Data Mining is frequently used to optimize costs in marketing scenarios. Most marketing campaigns result in costs and we want these campaigns to be as efficient as possible.
To illustrate this process let’s assume we’re operating a web shop. Customers need to register before they can order goods and therefore we know how many customers are returning. While we also spend some afford to win new customers, for this example we will focus on keeping the existing ones and making them buy more in our shop.
From our web shop system we know, that 50% of our customers place only one single order and do not return (within a given period of time, for example within a year). Marketing suggests to offer our existing customers a benefit in order to make them return, let’s say a $10 voucher.
The Simple Approach
While we will analyze more sophisticated models later, for now we will take a look at a very simple approach, that does not take differences between individual customers into account. Therefore, we are looking at average values. For example, the average profit for each order is $25.
Having just the information from above, we cannot decide if the idea of the marketing department is a good one. But we could run a test sample. Let’s say we give the voucher to a random sample of 500 customers. Now we can compare customers who got the voucher with customers who didn’t. After this test we find out that
Group 1 (Customers with voucher):  15% do not return 
Group 2 (Customers without the voucher):  50% do not return (as we already knew) 
So, based on our test sample, the voucher is pretty effective to keep customers returning. We can also calculate the success rate using this formula:
The success rate is impressive, but is it also sensible from a cost perspective? To answer this, let’s look at two options that we have:
 We give vouchers to none of our customers
 We give vouchers to all of our customers
For both options, we want to calculate the total costs, including costs from loosing existing customers. For all of the following examples we assume that we only have 1,000 customers and that we are looking at a specific time frame (for example a year). Let’s start with the first option. If we don’t give vouchers to any of our customers, the only costs come from loosing customers and therefore loosing profit. Since 50% of our customers are not returning and the average profit from each customer was $25, we loose 1,000 x 50% x $25 = $12,500 in this scenario. Now let’s look at the second option: From our test sample above we know that only 15% of the customers do not return after receiving a voucher. In the same way as for the first option we can calculate the loss value as 1,000 x 15% x $25 = $3,750. But for this option we also need to add the costs for the vouchers, which is 1,000 x $10 = $10,000. So the total costs for option 2 are $3,750 + $10,000 = $13,750. To sum things up:
Option 
Costs 

1  vouchers to none of our customers 
$ 12,500 
2  vouchers to all of our customers 
$ 13,750 
Of course there would also be the option to offer half of our customers a voucher, but this would just mean that we take the average of the two cost values above as shown in the following chart:
In this chart, option 1 corresponds to the left endpoint of the line while option 2 corresponds to the right endpoint of the line. So, although our vouchers are very successful, they don’t make sense here as they result in higher costs. Of course we could think of changing the value for the voucher but we would have to run another test to see how our customer base responds to the new voucher level (I’ll get back to this point in part 3 of this series). In most cases, the random sample method from above would have been done using different groups with different values for the vouchers. The effect on our success rate is usually not linear, so it’s worth finding the best value for a voucher. But we’ll keep things simple here and assume that the $10 voucher is already the best we could do.
Data Mining
Did I mention data mining in the title? So far we didn’t do any data mining with on our data. But maybe data mining gives us an option to optimize the costs in our scenario?
The main difference between our analysis from above and the data mining approach (and some other options that I will come to) is that now we are looking at the individual customer. In the first approach we just had an average customer dropoff rate of 50%. With data mining, we’re trying to predict the individual rate/score per customer. To do so, we have to set up a mining model that is trained with the knowledge we have about our customers (for example from the web registration form) and the individual buying behavior in the past. This model can then be used to predict returning customers. As this post is about cost optimization, we’re not actually building the model here but we just look at the output. So let’s assume that every customer now has a churn score associated with the customer record and that the model is validated and tested and performs well (see my other posts about back testing mining models).
We can now calculate the individual costs per customer for our two options from above:
Option 
Costs 
Option 1: Customers gets no voucher  churnscore x [average profit] 
Option 2: Customer gets voucher  churnscore x (1 [prevention success rate]) x [average profit] + [cost of a single voucher] 
For the first option, the calculation is simple. We only take the expectancy value of the profit loss here. If the individual churn score is for example 30% (0.3) then the costs in this calculation are 30% x $25 = $7.5. Please note that although we are looking at a data mining approach here, I’m still using the average profit of the customer. We will change this later to see the additional effect.
For the second option, we’re using our prevention success rate (70% from above). Even with the voucher, some customers will still not be returning. And as in the example from above, we have to add the costs of the voucher itself. If the individual churn score is 30% (0.3 as in the example above), then this case would result in the following costs: 30% x (100%70%) x $25 + $10 = $12,25.
For my example I’m using randomly generated test data (with a modified random function). As long as our average rates from above match with the score we’re using here, it is obvious that both approaches give the same result for the two decisions from above: no customers gets a voucher and every customer gets a voucher. So how do we use our churn score now? The answer is simple: Instead of addressing for example 50% of the customers we could now address 50% of the customers with the highest churn scores as we assume that it is more likely for them to go away.
The chart from above may now look like this:
You can clearly see the effect of the optimization here (green line). As we’re talking about costs, the lower the better. The green line showing our data mining approach has a minimum at appox. 22%. The value at that point is $11,818. So, using the data mining approach we could save $682 based on our sample set of 1,000 customers, compared to the total costs of $12,500 for the trivial approach. This is an improvement of about 5.45%. Remember though, that we’re only looking at sample data here, so for your scenario the output may be totally different. Also, customers with a high churn rate might not respond to our vouchers in the same way as customers with a lower churn rate.
In the next part of this post we’ll focus more on the profit value of this scenario.