Modern Data Stack
August 30, 2022
If you’ve been following commentary on the modern data stack recently, you may have noticed the tent’s steadily getting bigger. There also appears to be a growing ecosystem of companies that are beginning to take the next step, going from what a modern data stack can look like, to now internalizing this concept in their very own product solutions.
Take for example Snowflake’s announcements on data apps and what they’re calling “connected apps”. They first offered talks on the data app in November of 2020, but lately the concept has gotten a second wind. Snowflake’s first blog on connected apps might be eight months old but it wouldn’t be a stretch to say that we’ll likely be seeing more podcasts and webinars on the topic this year. In fact, recent announcements at Snowflake Summit, ranging from the Native App Marketplace to tighter integrations with Streamlit, indicate this is an area that will only continue to receive increasing investment in and support from cloud data warehouse providers like Snowflake.
But what exactly are data apps and connected apps, and what is the difference between the two?
You may already know “data apps” as data-intensive or data-driven applications – also described as analytics applications – that support dashboard-style visualizations for a rich and immersive user experience. Data apps must be responsive, interactive, and provide timely search results, even over very large datasets – the very qualities that are notoriously difficult to deliver with traditional database and data warehouse systems. Data apps are typically tools for consumption, to help people make use of their data and come to better decisions, and there is a growing focus on data becoming embedded within the operational experience of the application itself.
Enter the cloud-native data warehouse like Snowflake and its core tenets:
These cloud-native data platforms can provide an ideal foundation for many data applications. Rather than building complex pipelines that move data between different systems to support analytical and application use cases, increasingly data apps are built directly on these cloud data platforms and exposed directly to end users.
We should also start hearing more about the need for apps that can handle streaming data at a very high clip, whether it be traditional data collection in micro-batches or device-driven IoT data. Low-latency for rapid analysis and decision-making is essential for a data app to succeed. Snowflake’s announcement for Snowpipe Streaming and Unistore then, simplifying streaming ingestion and transformations and enabling joint transactional and analytical data workloads, could be considered very apt timing.
If data apps just mean data-intensive applications, then what are data apps built directly on top of modern cloud data platforms? Snowflake calls these “connected apps” to make this connection explicit.
Unlike traditional managed applications, where the vendor will often host and manage the end user’s data in their own proprietary datastore, the connected app model is better suited to integrating a third-party application directly where the customer’s data lives. In other words, a connected app user maintains ownership and control of their data, including the ability to enforce existing governance policies that are already in place. They expose their data to the connected app and receive the results powered by their own data platform, as depicted below:
The value of this model is quite clear when you start thinking about:
Now, if we consider managed services, they are naturally aimed at data consumers. They reduce administrative overhead for developers. Then, combined with a connected app approach, this provisioning strategy starts to change from one that is driven by capital expense to one that is driven directly by consumer use. In that light, we can begin to see how these two models are meant to complement each other, not compete.
When Apple launched its App store in 2008, it opened a window of opportunity for thousands of software developers to rush in and invent the mobile-first world we all live in today. Another similar opportunity exists today for B2B SaaS vendors across all industry verticals to plug into the underlying data infrastructure of their customers. Looking ahead, we can imagine a world where an ecosystem of apps with connected architectures are all serving as force multipliers to ultimately benefit the customer.
While some have boldly predicted that all SaaS apps will eventually be reimplemented as data apps on the cloud data warehouse (remains to be seen), what has become apparent is that there is real value and a growing trend in bringing apps to where the data resides. Notably, through this connected app architecture model, each app harmonizes and adds more inherent value to an unified source of data - the results of one being able to power another and provide additional feedback or context to the broader system of interconnected apps. This can drive further value across your lines of business, ensuring your teams can continue making data-driven decisions and taking data-empowered actions at all times while providing a seamless, end-to-end experience with data being a first-class citizen within your business operations.
The kicker to boot? Through this connected architecture framework, the customer remains in control of their data and data retention timelines and policies, which is important for large enterprises that wish to maintain governance and adhere to strict compliance requirements. In addition, customers only have to pay for storage once rather than maintaining disparate silos of data across multiple vendors or vendor applications.
If we look at a few quick examples of cross-app collaboration, we can see how the game changes with this shift in approach. You can power your A/B testing results with your product analytics data or take the charts from your BI tool and integrate them with your sales or customer success tools. You can create a unified data and analytics layer, combining CRM, customer engagement, product analytics, and customer support data, and even extending it with predictive modeling, to help power your business and decision making.
The end result would be something like this diagram:
We are already observing a growing base of companies that are taking this connected architecture approach to heart and building the connected app ecosystem of tomorrow.
A few examples across different verticals that come to mind (far from an exhaustive list):
Cloud Security / Cybersecurity - Hunters.ai, Panther, Lacework, and Securonix are all Powered by Snowflake partners that have embraced a connected app approach versus traditional SIEM software. They connect natively to Snowflake to read and analyze data along with surfacing incidents in a performant, secure, and highly scalable manner.
Cloud Analytics & BI: Sigma is a cloud warehouse native solution that embraces the connected architecture to read data so that users can perform real-time ad hoc analytics using spreadsheet-style dashboards and share the results with others.
Transformation: dbt has to be mentioned as the de facto transformation workflow tool of choice for the modern data stack and analytic engineers, encouraging this “connected app” ecosystem to develop on cloud data platforms like Snowflake with a shared semantic model.
Operational AI: Unlike traditional ML engineering platforms, Continual natively connects to your cloud data warehouse, ensuring your models are continuously being updated and maintained through its declarative approach, while writing up-to-date predictions back to the data warehouse where they can then be consumed by other downstream tools.
Experimentation: Eppo is an A/B testing and experimentation platform that helps companies run useful, reliable experiments by automating analysis, diagnostics, and investigations, all on top of customers' data warehouses.
Customer Marketing: MessageGears is a cloud-native platform that enables marketers to use live data from the cloud data warehouse to perform customer segmentation, activate audiences, and deploy personalized marketing campaigns at scale.
Customer Engagement: Supergrain is a customer engagement platform built natively for the cloud data warehouse and enables marketers to perform customer segmentation and run cross-channeling campaigns directly from data in their data warehouse.
Customer and Data Activation: Hightouch, Census and Flywheel are data activation platforms that sit directly on top of the data warehouse. In addition to facilitating reverse ETL workflows, users can build customer audiences in one place and then sync them into all their marketing and sales tools.
Product Led Revenue: Correlated is focused on enabling sales teams centered around companies taking a product led growth (PLG) strategy. By utilizing a connected architecture, views can be built on the ingested data and written back as playbooks.
To learn more about this emerging connected app architecture and ecosystem, you can join representatives from Snowflake, Supergrain, Correlated and Continual for a joint round-table discussion on connected apps and various use cases on September 21, 2022 at 1pm PST. Please sign up to get a detailed overview and multiple perspectives on the topic!
Discover the easiest path to operational ML on Databricks.