Preventing Customer Churn with Continual, Snowflake, and dbt

Use Case

February 8, 2022

In this article, we’ll take a deep dive into the customer churn/retention use case. This should contain everything needed to get started on the use case, and enterprising readers can also try this out for themselves in a free trial of Continual, following the customer churn example in the linked github repository.

Overview 

It’s safe to say that nearly every company out there has a concept of a “customer” and a clear desire to understand them and better meet their needs. As a company ages, it’s inevitable that some customers will leave; it’s impossible to make everyone happy and some churn is unavoidable. How one defines churn may vary from business to business; B2B businesses may transact via contracts, B2C via subscriptions or entirely via point of sale, etc.  However, churn is something that every company should strive to minimize as much as possible. 

Not only is customer churn one of the most broadly applicable ML use cases, but it’s also one of the most important. For a growing company, every exiting customer is a significant loss and can be seen as a bump in the road, and even established companies need to closely monitor churn to make sure it doesn’t spiral out of control. Having a good handle on churn and how to prevent it can help reduce churn rates, optimize marketing and advertisements, and increase customer satisfaction. At scale, every fraction of a percentage increase in retention can mean millions of dollars of revenue a quarter. This is a use case that can pay for itself almost instantly. 

Aside from the fact that customer churn is broadly applicable and mission-critical to most businesses, another reason I wanted to touch on it is that I don’t often see it presented in the right light. Although this is a use case that is often covered in machine learning tutorials, I’ve found that many sources oversimplify this use case. For example, the ‘canonical’ example is the Telco Customer Churn dataset. This is more useful than predicting survivors on the Titanic in terms of real-world use cases, but I find that the data doesn’t generally do a good job representing the real problem that a real company would have. However, this doesn’t stop data scientists from trying to model a churn problem in a similar way, and the results will leave something to be desired as a result. 

Here, we’ll take a different approach. In this article, we’ll articulate the methodology of solving the churn problem and use a public dataset much closer to what real businesses would use. This is a dataset that is temporal in nature, which should also align with what exists in any company’s data platform. Accompanying this article is also a worked example in our documentation. After reading this, we invite all to sign up for a free trial of Continual and  try building a customer churn model. We’ve opted to use Snowflake as our backing data warehouse, but it should be easy enough to follow along regardless of the cloud data warehouse your company uses (Get in touch if you have any issues!). This example also contains a backing git repository that has all the artifacts needed to train and deploy a model in Continual. We hope you’ll be able to use it as inspiration and model your own use cases similarly. Also, note for dbt users: there’s a dbt project in that repository that can be used to quickly jumpstart this use case. 

Now, let’s talk about the most important thing in this use case: how do we define churn

Defining Churn 

In an abstract form, an ML use case is simple to explain: select input signals (aka features) and a value to predict (aka a target), pass them into an algorithm, and voilà, a machine learning model is born. The details are a bit more complex. 

The first issue that normally arises is that the target doesn’t exist. Or, to put it a bit more succinctly, there’s no current data set with a well-defined target. One of the hardest tasks in machine learning is asking the right question – i.e. defining a target for a model. This sounds simple on the surface, but as we start digging into a use case, the complexities will reveal themselves. Some refer to this practice as prediction engineering, and it can be one of the more time-consuming parts of nailing down a new use case as the data team iterates on target definitions and readjusts data pipelines to capture the correct view of the data.  

Many churn tutorials focus on solving churn on a single non-temporal table or dataset, i.e. they construct a training set that is one row for each customer and a churn label that specifies whether that customer churned. In practice, we find this to be poorly representative of real-world churn problems. First, real customer use cases generally involve many different data sets that are not naturally stored in a nice CSV; the data acrobatics involved in boiling a problem down to a single dataframe is cumbersome, error-prone, and, frankly, unnecessary. Secondly, reducing each customer’s history to a single churn event removes a lot of data that can be used to help build better models. We typically want at least tens of thousands of observations to feed into an algorithm to build a predictive model, and that requirement may be a tall task if it translates directly to the number of customers. 

Instead of considering each customer churning as a single event, we should instead consider every transaction that occurs with a customer. Depending on our business model, this may vary. For example, we may have a monthly or weekly subscription service, wherein a customer opts to churn (unsubscribe) or renew many times a year, or the customer may sign larger multi-year contracts, or we may only interact with a customer when it walks into the store to buy something. The net effect is that every customer now generates many different churn/retention events and it’s possible to build a good model with a lower amount of total customers. 


Think carefully about whether or not features are temporal or not.


It should be possible to get all this transaction or contract data about customers, which will form the backbone of our model definition. To do this, we need a few pieces of information: 

  1. Prediction Period: The frequency with which we wish to make churn predictions. I.e. we may wish to make predictions every week, month, or quarter. We’ll call this the prediction period. Typically we’d make predictions on a set day in the period, such as the start of the week/month/quarter. 
  2. Churn Threshold: The number of days past account expiration that we consider a customer to have churned. This will be highly dependent on our business, but it may be 0 days, 30 days, 60 days, etc. We’ll call this the churn threshold
  3. Time Range: The start and end dates for every customer transaction. I.E. when do they need to renew their account? 

With this information, we can build a column to store our well-defined churn outcomes. This includes building the column for all historic data, for training purposes, as well as any current data to make predictions. The process is as follows:

  1. For every prediction period, gather any customers whose accounts are expiring within this period. 
  2. For each customer, consider their expiration date for the transaction ending in this period and compare it with the start date of their next transaction. 
  3. If this number is greater than the churn threshold, then we have churn! If this number is less than the churn threshold, then no churn occurred! 

This approach is summarized in the following graphic: 


This logic can be applied pretty simply in SQL to define a churn target, as shown below:


case
    --no next transaction and expired beyond churn threshold
    when (next_transaction is NULL and days_expired > churn_threshold) then True
    --there is a next transaction, but it is beyond the churn threshold
    when (next_transaction is not NULL 
        and datediff(days,expiration_dt,next_transaction) > churn_threshold ) then True
    else False
end as is_churn

Feature Engineering 

The other key part to setting up an ML problem is feature engineering, i.e. controlling the inputs to the model. At Continual, we believe better models are the result of better features, so it often behooves us to spend some time constructing good inputs. 

A churn problem is likely to bring in many different feature sets: customer data, sales data, product usage data, web traffic data, customer support data, etc. Insufficient data will mean that the model will be unable to find any relationships between the features and the target. In these cases, it’s not uncommon to end up with a model that predicts all customers will not churn. Churning is typically a low occurrence event (<10%), so, in absence of any meaningful inputs, it’s a safe strategy to just always say a customer will not churn. But, this of course is a model that is absolutely useless. 

Some of these feature sets will be non-temporal. For example, we will surely need a feature set that captures a customer’s demographic information – date of birth, address, gender, etc. We may also have product or account-related information that is fairly static and doesn’t change over time. However, we’ll also have some temporal data that will be of interest as well. Product usage data is likely collected frequently and should give great insight into how customers are engaging with our products. We may have data around how users are engaging with ads or email campaigns. We will also have historical sales data that may provide some signals. In general, the more signals we can bring into the use case, the better our models will become. 

Temporal datasets may benefit from performing some additional feature engineering on them. By default, Continual pulls in the latest record for each index when using temporal feature sets. In some feature sets, this may be the desired behavior. For example, when determining churn we will likely want to bring in information about the last transaction – in this case, we can simply register the data set and let Continual pull in the latest data. However, other times we may want to perform a window operation on my temporal feature set. For example, let’s say that we make a monthly prediction period and we collect product usage data every day. Instead of bringing in only the most recent usage data, I probably want to compute something like a 30-day average of key metrics. This would give me better insight into how customers use the product over the course of a month. 


Try to aggregate important metrics up to the prediction period.


For a churn use case, it's often helpful to perform these simple window functions to aggregate data up to the beginning of the prediction period. It’s important to perform these up to the boundary of the prediction period so future information is not leaked into the model. This can be very harmful and degrade model performance quickly on real data. A good strategy is to compute a rolling window operation (average, sum, etc.) on any feature of interest and register this as a feature set in Continual. When this feature set is connected to the model definition in Continual, the system will automatically pull in the latest value, which will give us the latest rolling values for any feature, as of the beginning of the prediction period. 

This operations can be performed easily in SQL, as shown below: 


SELECT
  user_id, 
  timestamp, 
  avg(daily_usage) over (partition by user_id order by timestamp desc rows 
      between 1 following and 30 following) as avg_usage_30d
FROM
  user_logs

Machine Learning Considerations 

Every ML use case has its own peculiarities and obstacles. Below is a list of items to be aware of for customer churn: 

  1. Imbalanced data: The nature of the churn problem is (hopefully!) one of imbalanced data. In other words, a small percentage of customers actually churn. While this is good for business, it does mean that the training set will have many many examples of retention and relatively few of churn (<5-10% is pretty common). If we are not careful, we can build a model that doesn’t actually do anything other than predict retention for every customer (and it’s correct 90+% of the time!). The simplest thing to do in this case is to select a model performance metric that deals with imbalanced data well. In particular, ROC AUC and F1 are generally good with imbalanced data, whereas we’ll want to avoid using accuracy as the performance metric. 
  1. Insufficient data: Insufficient data can either be caused by not having enough observations to train upon (i.e. hundreds or low thousands) or not having enough features. By focusing on the temporal nature of a customer’s interaction with a company, we can better construct a robust model definition. Finding sufficient features is an ongoing task. Talk to business users and learn what data exists that may be insightful to the problem. It’s also okay to conclude that there isn’t enough data to build a good model as well. This exercise should, at the very least, set up the system with a good foundation for the churn problem and create some baseline models to compare future performance against. With Continual, new models are built rapidly as new data comes in to see if any uplift has occurred. 
  1. Sneaky features leaking data: It’s sometimes easy to fall into a trap where a feature can leak information about the target, but it’s also something that is at first glance not obviously doing so. For the churn use case, I routinely see people use a feature corresponding to the age of the customer account. While this looks like an innocent feature at first, it’s probably true in most businesses that older customers are also more loyal customers – there are reasons why they’ve been around for 5, 10, 20 years, etc. At the same time, this is not something that a brand new customer could possibly obtain, so it’s possible that a model will place a lot of weight on this feature and then not pick up on other subtleties in the data. Something to check here is whether or not account age and the target are highly correlated. If so, remove it from the feature set and also inspect predictions between different populations to check if model performance is drastically different between them.  
  1. Don’t forget the big picture: Although this use case is commonly referred to as “customer churn,” it’s important to not forget the larger business context that we all operate in. Predicting churn is fun, but if we’re not actively preventing churn, we’re not impacting the business. With Continual, users get a closed-loop workflow to build and maintain models and predictions on top of their cloud data warehouse. We can tweak our data, build great models, and get predictions speedily into the desired database. From there, we recommend using a great partner tool, such as Hightouch, to sync data from the data warehouse back into tools that the business uses. For churn, it might be awesome to sync those predictions back into Salesforce or Hubspot so that the sales team has a better idea of their accounts’ flight risk! 
  1. What’s a good model performance?: There’s no right answer here, as a model’s performance is largely dependent upon how good our data is and if it contains signals for our target. Try to set realistic expectations: it’s unreasonable to expect that we’ll be able to create a model that is perfect at classifying all churn and retention customers. We cover this in some detail in the worked example, but we can tweak our model to generate more churn predictions at the expense of increasing the number of false positives (i.e. predicting churn for customers who won’t churn). Work with the business to understand what is the cost of each outcome and optimize efforts accordingly. Also, remember that some churn is inevitable and (remember the big picture!) an ML model can only provide predictions, it can’t cure the reason why customers are unhappy. 

Worked Example 

For purposes of brevity, we’ve added a customer churn example in our documentation. This will walk readers through a temporal customer churn use case step-by-step. Check it out here

Taking it Further

This is just the beginning of the customer churn journey! Once we have our foundation set, there are many ways to expand the use case. Here are some quick ideas: 

  1. Try out different churn thresholds: Different businesses have different requirements, and there may be an interest in multiple predictions for different churn windows: 30 days, 60 days, 90 days, etc. Once we have one model up and running, it’s a very quick edit to start building others. 
  1. Do churned customers ever return?: Indeed, this is quite an interesting question, and with the right data and inputs, we can figure this one out as well! In many ways, churned customers may represent ideal prospects – they already know the company brand and there was obviously something they liked about the product/service at some point. This begins to bleed into a lead scoring use case (to be covered in a future article!), but it’s not crazy to start thinking about what it takes to get a churned customer to turn back into a paying customer. 
  1. Voluntary vs Involuntary churn: Some companies like to make a distinction between customers who actively cancel their relationship (voluntary), and those that simply let their relationship lapse (involuntary). These could be handled as separate models or as one multiclass classification problem. Better understanding the forces behind voluntary and involuntary churn will likely give a leg up in understanding who are likely churn candidates to target for luring back into paying customers.. 
  1. Revenue churn: We’ve been focusing on customers this entire time, but we can view this entirely through the lens of a salesperson and argue that not all customers are created equal. Indeed, some are large spenders and some are not, and it’s debatable whether or not we should focus more on customer churn or revenue churn. In the latter case, we’re now predicting a continuous variable and not a discrete variable, so this jumps the problem from a classification problem to a regression problem, but it’s largely the same data that is under consideration in both use cases. 

Try Customer Churn with Continual Today

Continual is a great platform for solving your customer churn use cases and you shouldn’t stop there. Once you have this data registered in your Continual feature store, it’s a short hop to other use cases like customer lifetime value, lead scoring, sales forecasting, and more! You can test out your customer churn use case today with a free trial of Continual, request a demo, or learn more about the product in our documentation.

Sign up for more articles like this

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Use Case
Preventing Customer Churn with Continual, Snowflake, and dbt

In this use case deep dive, learn how to tackle the customer churn use case using Continual, Snowflake, and dbt.

Feb 8, 2022
Sign up for freeBook a demo