February 8, 2022
In this article, we’ll take a deep dive into the customer churn/retention use case. This should contain everything needed to get started on the use case, and enterprising readers can also try this out for themselves in a free trial of Continual, following the customer churn example in the linked github repository.
It’s safe to say that nearly every company out there has a concept of a “customer” and a clear desire to understand them and better meet their needs. As a company ages, it’s inevitable that some customers will leave; it’s impossible to make everyone happy and some churn is unavoidable. How one defines churn may vary from business to business; B2B businesses may transact via contracts, B2C via subscriptions or entirely via point of sale, etc. However, churn is something that every company should strive to minimize as much as possible.
Not only is customer churn one of the most broadly applicable ML use cases, but it’s also one of the most important. For a growing company, every exiting customer is a significant loss and can be seen as a bump in the road, and even established companies need to closely monitor churn to make sure it doesn’t spiral out of control. Having a good handle on churn and how to prevent it can help reduce churn rates, optimize marketing and advertisements, and increase customer satisfaction. At scale, every fraction of a percentage increase in retention can mean millions of dollars of revenue a quarter. This is a use case that can pay for itself almost instantly.
Aside from the fact that customer churn is broadly applicable and mission-critical to most businesses, another reason I wanted to touch on it is that I don’t often see it presented in the right light. Although this is a use case that is often covered in machine learning tutorials, I’ve found that many sources oversimplify this use case. For example, the ‘canonical’ example is the Telco Customer Churn dataset. This is more useful than predicting survivors on the Titanic in terms of real-world use cases, but I find that the data doesn’t generally do a good job representing the real problem that a real company would have. However, this doesn’t stop data scientists from trying to model a churn problem in a similar way, and the results will leave something to be desired as a result.
Here, we’ll take a different approach. In this article, we’ll articulate the methodology of solving the churn problem and use a public dataset much closer to what real businesses would use. This is a dataset that is temporal in nature, which should also align with what exists in any company’s data platform. Accompanying this article is also a worked example in our documentation. After reading this, we invite all to sign up for a free trial of Continual and try building a customer churn model. We’ve opted to use Snowflake as our backing data warehouse, but it should be easy enough to follow along regardless of the cloud data warehouse your company uses (Get in touch if you have any issues!). This example also contains a backing git repository that has all the artifacts needed to train and deploy a model in Continual. We hope you’ll be able to use it as inspiration and model your own use cases similarly. Also, note for dbt users: there’s a dbt project in that repository that can be used to quickly jumpstart this use case.
Now, let’s talk about the most important thing in this use case: how do we define churn?
In an abstract form, an ML use case is simple to explain: select input signals (aka features) and a value to predict (aka a target), pass them into an algorithm, and voilà, a machine learning model is born. The details are a bit more complex.
The first issue that normally arises is that the target doesn’t exist. Or, to put it a bit more succinctly, there’s no current data set with a well-defined target. One of the hardest tasks in machine learning is asking the right question – i.e. defining a target for a model. This sounds simple on the surface, but as we start digging into a use case, the complexities will reveal themselves. Some refer to this practice as prediction engineering, and it can be one of the more time-consuming parts of nailing down a new use case as the data team iterates on target definitions and readjusts data pipelines to capture the correct view of the data.
Many churn tutorials focus on solving churn on a single non-temporal table or dataset, i.e. they construct a training set that is one row for each customer and a churn label that specifies whether that customer churned. In practice, we find this to be poorly representative of real-world churn problems. First, real customer use cases generally involve many different data sets that are not naturally stored in a nice CSV; the data acrobatics involved in boiling a problem down to a single dataframe is cumbersome, error-prone, and, frankly, unnecessary. Secondly, reducing each customer’s history to a single churn event removes a lot of data that can be used to help build better models. We typically want at least tens of thousands of observations to feed into an algorithm to build a predictive model, and that requirement may be a tall task if it translates directly to the number of customers.
Instead of considering each customer churning as a single event, we should instead consider every transaction that occurs with a customer. Depending on our business model, this may vary. For example, we may have a monthly or weekly subscription service, wherein a customer opts to churn (unsubscribe) or renew many times a year, or the customer may sign larger multi-year contracts, or we may only interact with a customer when it walks into the store to buy something. The net effect is that every customer now generates many different churn/retention events and it’s possible to build a good model with a lower amount of total customers.
It should be possible to get all this transaction or contract data about customers, which will form the backbone of our model definition. To do this, we need a few pieces of information:
With this information, we can build a column to store our well-defined churn outcomes. This includes building the column for all historic data, for training purposes, as well as any current data to make predictions. The process is as follows:
This approach is summarized in the following graphic:
This logic can be applied pretty simply in SQL to define a churn target, as shown below:
The other key part to setting up an ML problem is feature engineering, i.e. controlling the inputs to the model. At Continual, we believe better models are the result of better features, so it often behooves us to spend some time constructing good inputs.
A churn problem is likely to bring in many different feature sets: customer data, sales data, product usage data, web traffic data, customer support data, etc. Insufficient data will mean that the model will be unable to find any relationships between the features and the target. In these cases, it’s not uncommon to end up with a model that predicts all customers will not churn. Churning is typically a low occurrence event (<10%), so, in absence of any meaningful inputs, it’s a safe strategy to just always say a customer will not churn. But, this of course is a model that is absolutely useless.
Some of these feature sets will be non-temporal. For example, we will surely need a feature set that captures a customer’s demographic information – date of birth, address, gender, etc. We may also have product or account-related information that is fairly static and doesn’t change over time. However, we’ll also have some temporal data that will be of interest as well. Product usage data is likely collected frequently and should give great insight into how customers are engaging with our products. We may have data around how users are engaging with ads or email campaigns. We will also have historical sales data that may provide some signals. In general, the more signals we can bring into the use case, the better our models will become.
Temporal datasets may benefit from performing some additional feature engineering on them. By default, Continual pulls in the latest record for each index when using temporal feature sets. In some feature sets, this may be the desired behavior. For example, when determining churn we will likely want to bring in information about the last transaction – in this case, we can simply register the data set and let Continual pull in the latest data. However, other times we may want to perform a window operation on my temporal feature set. For example, let’s say that we make a monthly prediction period and we collect product usage data every day. Instead of bringing in only the most recent usage data, I probably want to compute something like a 30-day average of key metrics. This would give me better insight into how customers use the product over the course of a month.
For a churn use case, it's often helpful to perform these simple window functions to aggregate data up to the beginning of the prediction period. It’s important to perform these up to the boundary of the prediction period so future information is not leaked into the model. This can be very harmful and degrade model performance quickly on real data. A good strategy is to compute a rolling window operation (average, sum, etc.) on any feature of interest and register this as a feature set in Continual. When this feature set is connected to the model definition in Continual, the system will automatically pull in the latest value, which will give us the latest rolling values for any feature, as of the beginning of the prediction period.
This operations can be performed easily in SQL, as shown below:
Every ML use case has its own peculiarities and obstacles. Below is a list of items to be aware of for customer churn:
For purposes of brevity, we’ve added a customer churn example in our documentation. This will walk readers through a temporal customer churn use case step-by-step. Check it out here.
This is just the beginning of the customer churn journey! Once we have our foundation set, there are many ways to expand the use case. Here are some quick ideas:
Continual is a great platform for solving your customer churn use cases and you shouldn’t stop there. Once you have this data registered in your Continual feature store, it’s a short hop to other use cases like customer lifetime value, lead scoring, sales forecasting, and more! You can test out your customer churn use case today with a free trial of Continual, request a demo, or learn more about the product in our documentation.