Building a Churn Insights Dashboard with Continual and Streamlit on Snowflake

Modern Data Stack

June 15, 2022

In this tutorial, we’re going to build an interactive customer Churn Insights Dashboard using the open-source Python framework, Streamlit, and the Continual predictions generated in Part 1: Snowflake and Continual Quickstart Guide. In Part 1, we connected Continual to Snowflake and used a simple dataset of customer information, activity, and churn status to build and operationalize a machine learning model in Continual to predict the likelihood of a customer churning. If you haven’t completed the Quickstart Guide, I encourage you to bookmark this page and follow the guide to create the predictions. 

Customer churn is a vital business metric and often one of the first to be tracked at early-stage startups. It doesn’t take long for a business to build enough history to begin accurately predicting which customers are most likely to churn. But making predictions isn’t enough. Data teams must expose this information with context to business stakeholders. Armed with this intel, go-to-market teams can spring into action and earn their customer’s loyalty and retention. As an AI layer for the modern data stack, Continual makes it easy to operationalize predictions in multiple ways, for example through reverse ETL solutions or via any data warehouse connected application. Today’s focus will be on using Streamlit to build a custom data app, in this case a customer churn insights dashboard, to inform business decisions that can prevent churn before it happens. 

This step-by-step tutorial will provide all the code snippets necessary to create a simple Customer Churn Dashboard using Continual’s predictions. You can either create the app by following along step-by-step or you can fork or clone the Github repo, modify the Snowflake queries to include your database and table, and add your own Snowflake connection credentials.

What is Streamlit? 

Streamlit is a free, open-source Python app framework that enables data scientists to quickly build interactive dashboards and machine learning web apps. Streamlit understands better than any other app framework that data people aren’t front end engineers. Rather than supporting a breadth of functionality, features, and extensions, users can write lines of Python code and use the Streamlit’s API to create the essential elements data people want to present. Users can style data, draw charts and maps, add interactive widgets, cache computation, define themes, and much more. Streamlit data apps are built like a Python script: line-by-line and top to bottom. 

Building a data app is one thing, but deploying to production can be a long, cumbersome process. Streamlit makes it easy to deploy straight from Github, in a single click. And it’s getting even better. 

Since its acquisition of Streamlit, Snowflake has been working on making it even easier to build and deploy Streamlit apps. Native integration between Snowflake and Streamlit was announced at the 2022 Snowflake Summit and it won’t be long before users will be able to write Python code in Snowflake Snowsight’s Worksheets and deploy and host the app directly on a Snowflake Virtual Warehouse. 

This is only an initial taste of how Streamlit is quickly evolving to work seamlessly with Snowflake datasets, Virtual Warehouses, and collaboration features. These advancements are driving towards a future where machine learning use cases can be implemented, end-to-end, in a few hours. 

Let’s get started.

Install Streamlit and Python packages

Streamlit can be installed with the Python package manager pip. It’s recommended to install Streamlit in a virtual environment like virtualenv or Anaconda to manage dependencies and keep things tidy. I’m going to use virtualenv in this example. 

Fire up a terminal

Create a virtual environment and make sure you have Python 3.7 - Python 3.9 installed: 

virtualenv streamlit-project; cd streamlit-project; . bin/activate
python --version

The final step in this tutorial will be to deploy our app on Streamlit Cloud. Streamlit Cloud launches apps directly from GitHub. 

Create a sub-directory to manage your project’s files: 

mkdir customer_churn_dashboard ; cd customer_churn_dashboard

Initialize a local Github repository:

git init

Install streamlit using pip: 

pip3 install streamlit

Test to make sure it works. It should automatically open a landing page in your browser with demos and links. 

streamlit hello

Back in your terminal, press CTL+C to stop the server. 

We’ll need the following packages for our app. In the same activated virtual environment where Streamlit was installed, create a requirements.txt file of the packages we’ll use: 

 # requirements.txt
snowflake-connector-python==2.7.6
pandas==1.4.0
numpy==1.22.2
matplotlib==3.5.1

Install the requirements: 

pip3 install -r requirements.txt

Great! We’re in business.

Create a Python file and begin scripting

Fire up your favorite text editor and create a Python file in the sub-directory in your virtual environment where you installed Streamlit.

First, import the required packages: 

import streamlit as st
import snowflake.connector
import pandas as pd
import matplotlib.pyplot as plt
import csv

Next, setup the page with a wide layout and give it a title

st.set_page_config(page_title="Customer Churn Dashboard",layout="wide")

Add the title to the page

# Page header 
st.title("Customer Churn Dashboard")

Let’s see how it looks in the browser. Use the streamlit run command in the the same terminal you created your Python file to start the server: 

streamlit run customer_churn_dashboard.py 

As we move forward, refresh the browser whenever you care to see your progress.

Connect Streamlit to Snowflake

Building data apps is not very fun unless we have access to… well, yeah nachos… annnnd data! The Streamlit docs provide examples of connecting to a list of different data sources so when we’re done here feel free to go nuts. The telco customer data we’re using for this tutorial resides in Snowflake along with our Continual predictions. 

While we’re developing our app locally, it will read-in connection credentials from .streamlit/secrets.toml in the app’s root directory. Create the file if it doesn’t exist and add your Snowflake username, password, account identifier, and the name of your warehouse, database, and schema:

# .streamlit/secrets.toml

[snowflake]
user = "xxx"
password = "xxx"
account = "xxx"
warehouse = "xxx"
database = "xxx"
schema = "xxx"
IMPORTANT: Add .streamlit/secrets.toml to .gitignore and don’t commit it to your Github repo!

Back in the Python file, let’s connect to Snowflake:

@st.experimental_singleton
def init_connection():
  return snowflake.connector.connect(**st.secrets["snowflake"])
  
conn = init_connection()

# Uses st.experimental_memo to only rerun when the query changes or after 10 min.
@st.experimental_memo(ttl=600)
def run_query(query):
  with conn.cursor() as cur:
    cur.execute(query)
    return cur.fetchall()

Now, let’s read-in our features and predictions from Snowflake. Our features are neatly stored in a Snowflake table telco_info. In real life, you are responsible for defining where your features reside in your Cloud Data Warehouse. 

For our predictions, Continual creates a table in your feature store for every model you create in the system. This table tracks all predictions made over time by model versions for that model. This table lives under <feature_store>.<project_id>.model_<model_id>_predictions_history. This is a full historical view of your predictions for the model and represents a complete audit. 

Continual additionally builds a view under <feature_store>.<project_id>.model_<model_id>_predictions which represents the latest prediction made for each record in your model. This is the table we’ll use to gather our predictions for this tutorial. 

IMPORTANT: Update the following query with your database and table. 
# Query Snowflake for features and predictions
data = run_query("SELECT ds.*, p.churn_prediction, p.churn_true_prediction_score from sf_database.customerchurn.telco_info ds left join sf_database.customerchurn.MODEL_CUSTOMER_CHURN_30DAYS2_PREDICTIONS p using (id)")

# Create dataframe of features and predictions
df = pd.DataFrame(data, columns=["CustomerID", "State", "CustomerAge", "Area_Code", "International_Plan","Voicemail_Plan", "Voicemail_Messages", "Total_Day_Minutes", "Total_Day_calls", "Total_Day_Charge","Total_Eve_Minutes", "Total_Eve_Calls", "Total_Eve_Charge", "Total_Night_Minutes", "Total_Night_Calls","Total_Night_Charge","Total_Intl_Minutes", "Total_Intl_Calls", "Total_Intl_Charge", "Number_Customer_Service_Calls","Churn", "Churn_Prediction", "Churn_True_Prediction_Score"])

We may want to do some minor data processing to spruce things up and make it presentable. 

# Edit data
df['Area_Code'] = df.Area_Code.str[10:]

Our example dataset doesn’t include financial information, but when actually implementing this use case you should obtain financial data like customer spend. Let’s add a fictitious customer monthly spend to our dataframe. 

# Add a fictitious customer monthly spend to the dataframe
df['Monthly_Spend'] = 75

Great! We have a dataframe with our model’s features, target, prediction, and prediction score. We’ll use this dataframe for data processing, analysis, and display.

Create a Settings sidebar

We want our dashboard users able to view the churn predictions in a way that’s relevant to them while also seeing the big picture. For example, a regional vice president might want to see which customers (in their region) are likely to churn so they can prepare and execute an engagement plan to retain customers. But they also want to discover patterns in the data that will lead them to a root cause. In addition to selecting a geographic location, a dashboard user might want to look at the churn risk of long-time customers because they might be more likely to retain. Or maybe they want to look at demographics to consider whether it’s worthwhile to create a promotion personalized to an age bracket, income level, or gender. The data should be tailored to the user and their situation. 

Streamlit has several interactive widgets for users to play with. We’ll use widgets to enable dashboard users to toggle between a cut of data and the full dataset to get a zoomed in perspective as well as a big picture perspective. 

Where should the settings be laid out on the screen? Streamlit has a few options for organizing elements on the page. A sidebar is useful because it is clearly separate from the rest of the application and never accidentally hidden while scrolling. We’ll create a sidebar object and pass our setting elements using the “with” notation. 

# Settings section
with st.sidebar:
  st.title("Data Settings")
NOTE: all elements we add to the setting sidebar must be indented under with st.sidebar

The first setting we’ll add is a slider widget for users to adjust the churn probability threshold. It’s not usually worthwhile to invest in customers that are 99% likely to churn because it’s a long walk back to retention. Sales and Marketing dollars are better spent on customers who have a moderate likelihood of churning. 

# Filter data by the customer's probability of churn 
threshold = st.slider('Set Churn Probability Threshold', 0.0, 1.0,(0.4, 0.8))

Next, let’s add a setting to segment our data based on account plan. Customers who have an international plan are likely to have behavioral patterns distinct from customers with domestic plans. They may travel regularly on business trips or have family members in other parts of the world. There are many new variables to consider when a customer is using the service internationally that could influence their loyalty. 

You can add this line underneath our newly created dataframe. 

# Filter data by account plan
plan = st.radio('Choose plan', ['Both','Domestic','International'])

Let’s create a multiselect for users to select the states relevant to their role. By default, we should have all 50 states selected but offer the ability to deselect certain states. The multiselect widget will work well for this: 

 # Filter by State
states = st.multiselect("Select states", df['State'].unique(), default=df['State'].unique())

Based on our settings, we will create an adjusted_df that adheres to our user’s settings. 

# create an adjusted dataframe based on the user's settings
if plan == 'Both':
  adjusted_df = df.loc[(
    (df['Churn_True_Prediction_Score'] >= threshold[0])
    &
    (df['Churn_True_Prediction_Score'] <= threshold[1])
    &
    (df['State'].isin(states))   
  )]
else:
  adjusted_df = df.loc[(
    (df['Churn_True_Prediction_Score'] >= threshold[0])
    &
    (df['Churn_True_Prediction_Score'] <= threshold[1])
    &
    (df['International_Plan'] == 'yes' 
    if plan == "International" else df['International_Plan'] == 'no')
    &
    (df['State'].isin(states))
  )]
IMPORTANT: Make sure to settings sidebar code underneath with st.sidebar

Before we go any further, let’s refresh our browser and marvel at our settings sidebar. 

Sweet as cherry pie!

Display metrics

We’ve implemented a settings sidebar for dashboard users to filter data based on different factors. Let’s display high level metrics that summarize customer churn for the company. The questions we want to answer are: 

  1. How many customers are predicted to churn? 
  2. What is the percentage of revenue retained? 
  3. How much revenue are we losing if X customers churn?

We can show the total number of customers predicted to churn based on the settings our user has. 

We can use Streamlit’s metric element and create columns to distribute each metric horizontally across the top of the page. 

# Create columns for each metric to display horizontally
churn_col, net_retention_rate_col, monthly_revenue_col = st.columns(3)

with churn_col:   
  # Count num customers within the churn prediction threshold
  st.metric("Customers Predicted to Churn",adjusted_df.CustomerID.count())

with net_retention_rate_col:
  # Calculate and display the net retention rate
  # Monthly Net Retention Rate = (Recurring Revenue + (Expansion Revenue - Lost Revenue) / Recurring Revenue
  monthly_recurring_rev = df.Monthly_Spend.sum()
  monthly_expansion_rev = monthly_recurring_rev*0.0415
  monthly_lost_rev = adjusted_df.Monthly_Spend.sum()
  net_retention_rate = (monthly_recurring_rev+monthly_expansion_rev-monthly_lost_rev)/monthly_recurring_rev
  st.metric("Forecasted Net Retention Rate", "{:.1%}".format(net_retention_rate))

with monthly_revenue_col:
  # Revenue impact if the customers predicted to churn actually churn
  st.metric("Forecasted Monthly Revenue Impact", "${:,.2f}".format(monthly_lost_rev))

Refresh your browser to see the results: 

Display customer data

Thanks to our nifty metrics, our dashboard users have a summary of customer churn. Now let’s present the data of customers predicted to churn to provide granularity. 

We used the st.title object earlier and our metrics need no introduction. But here, let’s give the section a header: 

st.header("Customers at risk of churning")

Let’s order the data by the churn probability from most-likely to least-likely and within the threshold set by our dashboard user in the settings sidebar. To make it clear that the rows are sorted by prediction probability, we can move the prediction score column to the far left: 

sorted_df = adjusted_df.sort_values(by='Churn_True_Prediction_Score', ascending=False)
sorted_df.insert(0, 'Probability of Churn', sorted_df.pop('Churn_True_Prediction_Score'))
st.dataframe(sorted_df.iloc[:, 0:21])
st.write(" ")

Visualize data with matplotlib

Unfortunately, customers don’t call customer service to share how overjoyed they feel about the flawless telecom service (Sidenote: please be kind to customer service representatives!). Let’s make a graph showing the percentage of customers who churn after X customer service calls. 

As we did above with metrics, let’s create columns to display two graphs side-by-side. 

graph_col1, graph_col2 = st.columns(2)

In our first column, let’s graph how many customer service calls “at-risk” customers have made: 

with graph_col1:
  # Count the number of customers likely to churn in each category of customer service contact   
  cust_service_calls = adjusted_df.groupby(by='Number_Customer_Service_Calls').count().reset_index()‍   
  fig, ax = plt.subplots()   
  ax.bar(cust_service_calls.Number_Customer_Service_Calls, cust_service_calls.CustomerID, color = 'red', label='Customers')   
  ax.set_xlabel("Customer Service Calls")   
  ax.set_ylabel("Customers predicted to Churn")   
  plt.title("Calls with customer service calls by at-risk customers")   
  st.pyplot(fig)

In our second column, let’s evaluate how many “at-risk” customers are on the domestic plan or international plan: 

with graph_col2:   
	# Churned customers with an International plan versus domestic plan     
	fig2, ax2 = plt.subplots()   
	ax2.bar(['International', 'Domestic'],[adjusted_df[(adjusted_df.International_Plan=='yes')].CustomerID.count(), adjusted_df[(adjusted_df.International_Plan=='no')].CustomerID.count()] , color = 'red')   
  ax2.set_xlabel("Account Plan")   
	ax2.set_ylabel("Customers predicted to Churn")   
	plt.title("Number of Churned customers per account plan")   
	st.pyplot(fig2)

Finally, let’s add a map with the locations of the customers who are on our watch list. Streamlit offers the st.map element which wraps st.pydeck_chart to quickly create scatterplot charts on top of a map, with auto-centering and auto-zoom. 

st.header("Customers predicted to Churn by State")

Download this csv with the longitude and latitude of each State’s capitol. Then read the csv and pass the coordinates into the map element

# Read in data of states and their capitol's longitude and latitude
with open('state_capitol_coordinates.csv', mode='r') as myfile:
  reader = [rows for rows in csv.reader(myfile)]
  latitude = {rows[2]:rows[0] for rows in reader}    
  longitude = {rows[2]:rows[1] for rows in reader}   
 
adjusted_df['latitude']=adjusted_df['State'].map(latitude).astype("int")   
adjusted_df['longitude']=adjusted_df['State'].map(longitude).astype("int")
st.map(adjusted_df, 2)

Deploy on Streamlit Cloud

The process of creating and utilizing predictions and insights from data is a long journey. Ordinarily, at this point, a cumbersome deployment process would remain between us and presenting the dashboard to our business stakeholders. Fortunately, Streamlit makes it incredibly easy to deploy apps right from Github. 

Add and commit the code to Github:

 git add * && git commit -m “initial commit” 

Add the local repository to Github:

Option 1: Use Github’s CLI

gh repo create

On the first prompt, select “Push an existing local repository to GitHub”

Make sure to select “Yes” on the prompt asking whether you want to push commits from the current branch to the newly created remote. 

Option 2: Use Git

Create a new repository

At the top of your repository on GitHub.com's Quick Setup page, click to copy the remote repository URL:

In Terminal, add the URL for the remote repository where your local repository will be pushed:

 git remote add origin  

Push the local repository to Github:

 git push -u origin main 

Once your code is in Github, copy the URL of your repo for the next step. 

Enter share.streamlit.io into your browser and login

Click “New app”: 

Enter your github repository, branch, and python file:  

Recall we added our secrets.toml file to .gitignore and was not committed to Github in the last step. We need to pass the contents of secrets.toml to our Streamlit Cloud deployed app separately. 

To copy your app secrets to the cloud, click “Advanced Settings” and copy the contents of your secrets.toml file into the Secrets text area. 

NOTE: You will likely receive an email from GitHub saying a new public key was added to your repo. This is the expected behavior. When you try to deploy an app that lives in a private repo, Streamlit Cloud needs to get access to that repo somehow. Streamlit creates a read-only GitHub Deploy Key then accesses the repo using a public SSH key. 

Once you hit “Deploy” your app should be hosted and ready to share within minutes! 

Summary

In this tutorial, we created an interactive dashboard for stakeholders to make critical business decisions and retain customers. Streamlit made it simple and fast to build a custom, interactive data app for business stakeholders. 

In the part one of this tutorial, we got a taste of how loading and querying data is performant, economical and near-infinitely scalable with Snowflake, making it an excellent foundation for Continual. We learned how Continual separates the declarative modeling layer and the imperative operational layer to empower any data professional to build production machine learning use cases without worrying about the intricacies of the operational pipeline or infrastructure. Leveraging Continual’s built-in automated machine learning capabilities, we were able to create customer churn predictions in minutes. 

In the coming months Continual will be announcing more built in ML capabilities as well as extensibility of the operational layer, to allow data scientists to bring custom models to Continual.  Be sure to sign up for updates at the end of this blog to learn more. 

Now that you’ve been through a simple example of customer churn, you’re ready to start a real use case for your business. Get in touch with our Customer Engineering team to help onboard your first use case or learn more from our documentation.

Sign up for more articles like this

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Modern Data Stack
Introducing Databricks Support: Operational AI for the Lakehouse

Discover the easiest path to operational ML on Databricks.

Jun 28, 2022
Modern Data Stack
Building a Modern Data Team: From Analytics to AI

What's the secret to building a great data team and enabling AI use cases? We'll dive in during this article.

Jun 28, 2022
Book a demo