Modern Data Stack
June 15, 2022
In this tutorial, we’re going to build an interactive customer Churn Insights Dashboard using the open-source Python framework, Streamlit, and the Continual predictions generated in Part 1: Snowflake and Continual Quickstart Guide. In Part 1, we connected Continual to Snowflake and used a simple dataset of customer information, activity, and churn status to build and operationalize a machine learning model in Continual to predict the likelihood of a customer churning. If you haven’t completed the Quickstart Guide, I encourage you to bookmark this page and follow the guide to create the predictions.
Customer churn is a vital business metric and often one of the first to be tracked at early-stage startups. It doesn’t take long for a business to build enough history to begin accurately predicting which customers are most likely to churn. But making predictions isn’t enough. Data teams must expose this information with context to business stakeholders. Armed with this intel, go-to-market teams can spring into action and earn their customer’s loyalty and retention. As an AI layer for the modern data stack, Continual makes it easy to operationalize predictions in multiple ways, for example through reverse ETL solutions or via any data warehouse connected application. Today’s focus will be on using Streamlit to build a custom data app, in this case a customer churn insights dashboard, to inform business decisions that can prevent churn before it happens.
This step-by-step tutorial will provide all the code snippets necessary to create a simple Customer Churn Dashboard using Continual’s predictions. You can either create the app by following along step-by-step or you can fork or clone the Github repo, modify the Snowflake queries to include your database and table, and add your own Snowflake connection credentials.
Streamlit is a free, open-source Python app framework that enables data scientists to quickly build interactive dashboards and machine learning web apps. Streamlit understands better than any other app framework that data people aren’t front end engineers. Rather than supporting a breadth of functionality, features, and extensions, users can write lines of Python code and use the Streamlit’s API to create the essential elements data people want to present. Users can style data, draw charts and maps, add interactive widgets, cache computation, define themes, and much more. Streamlit data apps are built like a Python script: line-by-line and top to bottom.
Building a data app is one thing, but deploying to production can be a long, cumbersome process. Streamlit makes it easy to deploy straight from Github, in a single click. And it’s getting even better.
Since its acquisition of Streamlit, Snowflake has been working on making it even easier to build and deploy Streamlit apps. Native integration between Snowflake and Streamlit was announced at the 2022 Snowflake Summit and it won’t be long before users will be able to write Python code in Snowflake Snowsight’s Worksheets and deploy and host the app directly on a Snowflake Virtual Warehouse.
This is only an initial taste of how Streamlit is quickly evolving to work seamlessly with Snowflake datasets, Virtual Warehouses, and collaboration features. These advancements are driving towards a future where machine learning use cases can be implemented, end-to-end, in a few hours.
Let’s get started.
Streamlit can be installed with the Python package manager pip. It’s recommended to install Streamlit in a virtual environment like virtualenv or Anaconda to manage dependencies and keep things tidy. I’m going to use virtualenv in this example.
Fire up a terminal.
Create a virtual environment and make sure you have Python 3.7 - Python 3.9 installed:
The final step in this tutorial will be to deploy our app on Streamlit Cloud. Streamlit Cloud launches apps directly from GitHub.
Create a sub-directory to manage your project’s files:
Initialize a local Github repository:
Install streamlit using pip:
Test to make sure it works. It should automatically open a landing page in your browser with demos and links.
Back in your terminal, press CTL+C to stop the server.
We’ll need the following packages for our app. In the same activated virtual environment where Streamlit was installed, create a requirements.txt file of the packages we’ll use:
Install the requirements:
Great! We’re in business.
Fire up your favorite text editor and create a Python file in the sub-directory in your virtual environment where you installed Streamlit.
First, import the required packages:
Next, setup the page with a wide layout and give it a title:
Add the title to the page:
Let’s see how it looks in the browser. Use the streamlit run command in the the same terminal you created your Python file to start the server:
As we move forward, refresh the browser whenever you care to see your progress.
Building data apps is not very fun unless we have access to… well, yeah nachos… annnnd data! The Streamlit docs provide examples of connecting to a list of different data sources so when we’re done here feel free to go nuts. The telco customer data we’re using for this tutorial resides in Snowflake along with our Continual predictions.
While we’re developing our app locally, it will read-in connection credentials from .streamlit/secrets.toml in the app’s root directory. Create the file if it doesn’t exist and add your Snowflake username, password, account identifier, and the name of your warehouse, database, and schema:
IMPORTANT: Add .streamlit/secrets.toml to .gitignore and don’t commit it to your Github repo!
Back in the Python file, let’s connect to Snowflake:
Now, let’s read-in our features and predictions from Snowflake. Our features are neatly stored in a Snowflake table telco_info. In real life, you are responsible for defining where your features reside in your Cloud Data Warehouse.
For our predictions, Continual creates a table in your feature store for every model you create in the system. This table tracks all predictions made over time by model versions for that model. This table lives under <feature_store>.<project_id>.model_<model_id>_predictions_history. This is a full historical view of your predictions for the model and represents a complete audit.
Continual additionally builds a view under <feature_store>.<project_id>.model_<model_id>_predictions which represents the latest prediction made for each record in your model. This is the table we’ll use to gather our predictions for this tutorial.
IMPORTANT: Update the following query with your database and table.
We may want to do some minor data processing to spruce things up and make it presentable.
Our example dataset doesn’t include financial information, but when actually implementing this use case you should obtain financial data like customer spend. Let’s add a fictitious customer monthly spend to our dataframe.
Great! We have a dataframe with our model’s features, target, prediction, and prediction score. We’ll use this dataframe for data processing, analysis, and display.
We want our dashboard users able to view the churn predictions in a way that’s relevant to them while also seeing the big picture. For example, a regional vice president might want to see which customers (in their region) are likely to churn so they can prepare and execute an engagement plan to retain customers. But they also want to discover patterns in the data that will lead them to a root cause. In addition to selecting a geographic location, a dashboard user might want to look at the churn risk of long-time customers because they might be more likely to retain. Or maybe they want to look at demographics to consider whether it’s worthwhile to create a promotion personalized to an age bracket, income level, or gender. The data should be tailored to the user and their situation.
Streamlit has several interactive widgets for users to play with. We’ll use widgets to enable dashboard users to toggle between a cut of data and the full dataset to get a zoomed in perspective as well as a big picture perspective.
Where should the settings be laid out on the screen? Streamlit has a few options for organizing elements on the page. A sidebar is useful because it is clearly separate from the rest of the application and never accidentally hidden while scrolling. We’ll create a sidebar object and pass our setting elements using the “with” notation.
NOTE: all elements we add to the setting sidebar must be indented under with st.sidebar
The first setting we’ll add is a slider widget for users to adjust the churn probability threshold. It’s not usually worthwhile to invest in customers that are 99% likely to churn because it’s a long walk back to retention. Sales and Marketing dollars are better spent on customers who have a moderate likelihood of churning.
Next, let’s add a setting to segment our data based on account plan. Customers who have an international plan are likely to have behavioral patterns distinct from customers with domestic plans. They may travel regularly on business trips or have family members in other parts of the world. There are many new variables to consider when a customer is using the service internationally that could influence their loyalty.
You can add this line underneath our newly created dataframe.
Let’s create a multiselect for users to select the states relevant to their role. By default, we should have all 50 states selected but offer the ability to deselect certain states. The multiselect widget will work well for this:
Based on our settings, we will create an adjusted_df that adheres to our user’s settings.
IMPORTANT: Make sure to settings sidebar code underneath with st.sidebar
Before we go any further, let’s refresh our browser and marvel at our settings sidebar.
Sweet as cherry pie!
We’ve implemented a settings sidebar for dashboard users to filter data based on different factors. Let’s display high level metrics that summarize customer churn for the company. The questions we want to answer are:
We can show the total number of customers predicted to churn based on the settings our user has.
We can use Streamlit’s metric element and create columns to distribute each metric horizontally across the top of the page.
Refresh your browser to see the results:
Thanks to our nifty metrics, our dashboard users have a summary of customer churn. Now let’s present the data of customers predicted to churn to provide granularity.
We used the st.title object earlier and our metrics need no introduction. But here, let’s give the section a header:
Let’s order the data by the churn probability from most-likely to least-likely and within the threshold set by our dashboard user in the settings sidebar. To make it clear that the rows are sorted by prediction probability, we can move the prediction score column to the far left:
Unfortunately, customers don’t call customer service to share how overjoyed they feel about the flawless telecom service (Sidenote: please be kind to customer service representatives!). Let’s make a graph showing the percentage of customers who churn after X customer service calls.
As we did above with metrics, let’s create columns to display two graphs side-by-side.
In our first column, let’s graph how many customer service calls “at-risk” customers have made:
In our second column, let’s evaluate how many “at-risk” customers are on the domestic plan or international plan:
Finally, let’s add a map with the locations of the customers who are on our watch list. Streamlit offers the st.map element which wraps st.pydeck_chart to quickly create scatterplot charts on top of a map, with auto-centering and auto-zoom.
Download this csv with the longitude and latitude of each State’s capitol. Then read the csv and pass the coordinates into the map element:
The process of creating and utilizing predictions and insights from data is a long journey. Ordinarily, at this point, a cumbersome deployment process would remain between us and presenting the dashboard to our business stakeholders. Fortunately, Streamlit makes it incredibly easy to deploy apps right from Github.
Add and commit the code to Github:
Add the local repository to Github:
On the first prompt, select “Push an existing local repository to GitHub”
Make sure to select “Yes” on the prompt asking whether you want to push commits from the current branch to the newly created remote.
At the top of your repository on GitHub.com's Quick Setup page, click to copy the remote repository URL:
In Terminal, add the URL for the remote repository where your local repository will be pushed:
Push the local repository to Github:
Once your code is in Github, copy the URL of your repo for the next step.
Enter share.streamlit.io into your browser and login.
Click “New app”:
Enter your github repository, branch, and python file:
Recall we added our secrets.toml file to .gitignore and was not committed to Github in the last step. We need to pass the contents of secrets.toml to our Streamlit Cloud deployed app separately.
To copy your app secrets to the cloud, click “Advanced Settings” and copy the contents of your secrets.toml file into the Secrets text area.
NOTE: You will likely receive an email from GitHub saying a new public key was added to your repo. This is the expected behavior. When you try to deploy an app that lives in a private repo, Streamlit Cloud needs to get access to that repo somehow. Streamlit creates a read-only GitHub Deploy Key then accesses the repo using a public SSH key.
Once you hit “Deploy” your app should be hosted and ready to share within minutes!
In this tutorial, we created an interactive dashboard for stakeholders to make critical business decisions and retain customers. Streamlit made it simple and fast to build a custom, interactive data app for business stakeholders.
In the part one of this tutorial, we got a taste of how loading and querying data is performant, economical and near-infinitely scalable with Snowflake, making it an excellent foundation for Continual. We learned how Continual separates the declarative modeling layer and the imperative operational layer to empower any data professional to build production machine learning use cases without worrying about the intricacies of the operational pipeline or infrastructure. Leveraging Continual’s built-in automated machine learning capabilities, we were able to create customer churn predictions in minutes.
In the coming months Continual will be announcing more built in ML capabilities as well as extensibility of the operational layer, to allow data scientists to bring custom models to Continual. Be sure to sign up for updates at the end of this blog to learn more.
Now that you’ve been through a simple example of customer churn, you’re ready to start a real use case for your business. Get in touch with our Customer Engineering team to help onboard your first use case or learn more from our documentation.
Discover the easiest path to operational ML on Databricks.