Architecture
Last updated
Last updated
WarpStream eases the burden of running Apache Kafka by replacing the deployment and maintenance of a physical Kafka cluster with a single stateless binary (called the Agent) that only communicates with object storage like Amazon S3 and our Cloud Metadata Store. WarpStream Agents speak the Apache Kafka protocol, but unlike an Apache Kafka broker, any Agent can act as the leader for any topic, commit offsets for any consumer group, or act as the coordinator for the cluster. No Agent is special, so auto-scaling based on CPU-usage or network bandwidth is trivial. Running the WarpStream Agent is as easy as running a proxy or web server, such as nginx.
But hang on a second, that sounds too good to be true. How did we accomplish this if Apache Kafka requires running ZooKeeper (or using kRaft), and maintaining a cluster of stateful brokers that have to be rebalanced all the time?
We separate storage and compute.
We separate data from metadata.
We separate the data plane from the control plane.
Separating storage and compute is a common technique for scaling modern data processing systems. It allows you to scale your compute clusters up, down, in, or out in response to load while leveraging low-cost storage managed by someone else, such as Amazon S3. It allows any compute node to process data from any file in storage instead of each node "owning" a subset of the data.
Separating storage and compute allows operators to scale up or down the number of WarpStream Agents to respond to changes in load without rebalancing data. It also enables faster recovery from failures because any request can be retried on another Agent immediately. We also eliminate hotspots, where some Kafka brokers have dramatically higher load than others due to uneven amounts of data in each partition.
All of these hard problems have been delegated to hyper-scale cloud provider object storage services, where tens-of-millions of human-years of effort and billions of dollars have been invested into durability, availability, and operational excellence.
Another pillar of WarpStream's design is the separation of data from metadata. This is also becoming a more common technique, such as Snowflake's SQL data warehouse storing primary data in object storage and metadata in FoundationDB. Our founders did this at Datadog too when building Husky, the system that powers storage and queries for logs, real-user monitoring, network performance monitoring, and many other products at Datadog.
WarpStream uses this technique to offload metadata management from our customers' operations teams to ours. We store the metadata for every cluster in our cloud metadata store designed from scratch to only solve this specific problem, operated 24x7 by the team who wrote it. This separation also provides useful security guarantees because we cannot read the data in your topics, even if our cloud was compromised.
At a high level, the data plane of a WarpStream virtual cluster is a pool of Agents connected to our cloud. Any Agent in any pool can serve any produce or consume request for topics in that virtual cluster. The control plane runs in our cloud, where we decide which Agents will be compacting your data files for optimal performance, which Agents will participate in the distributed, zone-aware object storage cache, and which Agents will scan your object storage bucket for files which are past retention and can be deleted.
Our control plane enables us to deliver on our promise of an Apache Kafka-compatible streaming system that is as easy to operate as nginx by offloading the hard problems of consensus and coordination onto our fully-managed control plane, while at the same time achieving a much lower TCO with the object storage-backed data plane running on the Agents.
Multiple vendors offer "Tiered Storage" with an Apache Kafka-compatible interface. This means either you or the vendor run something that looks roughly like a stateful Apache Kafka broker that periodically offloads some older data to S3.
WarpStream does not work this way.
The Agent does not require local disks at all. Data streams directly from the Agent to object storage instead of being replicated via extremely expensive cross-AZ networking at an effective cost of $0.05/GB in AWS at retail prices. That's the same cost as storing data in S3 for over 2 months! Instead, the WarpStream Agent leverages the free networking between EC2 and S3 in AWS, which just so happens to durably replicate your data along the way.
Working directly on top of S3 required a complete re-architecture of the system of the ground up. Other vendors can continue to push "Tiered Storage" to the physical limit, which is all but the last record of each topic-partition stored on a stateful broker and the remaining records in object storage, but they cannot compete with the savings of never going across zonal boundaries in the first place.
Naively attempting to implement an Apache Kafka-compatible system on top of S3 will end up with either extremely high latency, or a huge S3 API operations bill. For example, making a file per partition every 30 seconds is likely cost effective, but 30 seconds of latency is not what Apache Kafka users expect from their system today. Making a file per partition every 100ms could provide the latency that Kafka users expect, but the minimum cost per partition per month would be roughly $130/month in S3 PUT operations alone. That doesn't even get into the cost or latency to consume so many tiny files!
WarpStream Agents make a few files per second, but each file contains records from multiple topics and partitions. In the background the pool of Agents compacts those small files into larger files to make reprocessing historical data for both single partitions and whole topics both cost effective and high throughput. This compaction and batching approach achieves a reasonable tradeoff between cost and end to end latency.
Everyone loves a good architecture diagram, so we've provided ours here. The important bits are the fact that the Agent Pool runs inside a customer's VPC and not in ours, and customer data is never sent outside the customer VPC. The only data transferred from the Agent pool to the WarpStream Cloud is metadata about which files belong to a given Virtual Cluster, which is a collection of topics and partitions administered together. Applications connect to the Agent Pool using standard Apache Kafka clients.
A Virtual Cluster is the metadata store for WarpStream. Each customer can create multiple isolated Virtual Clusters for separating teams or departments. Kafka API operations within a Virtual Cluster are atomic, including producing records to multiple topics and partitions. Each Virtual Cluster is a replicated state machine which stores the mapping between files in object storage and ranges of offsets in each Kafka topic-partition.
Every Virtual Cluster metadata operation is journaled to our strongly-consistent log storage system before being executed by a Virtual Cluster replica and acknowledged back to the Agent, which then acknowledges the request from your client application.
Within a Virtual Cluster, the Agents can optionally be configured to each serve a specific role. This feature is only possible in WarpStream due to its decoupled architecture and cloud-native design. This functionality is not possible with Apache Kafka or any proprietary distributions of Kafka.
Our Cloud Services layer manages the lifecycle of each replica of your Virtual Clusters. Each Virtual Cluster has multiple replicas for high availability and is backed up to object storage itself in sync with our log storage system. Replacement replicas of your Virtual Cluster are created automatically when existing replicas fail without any human intervention based on those backups and replaying the metadata command log.
The Cloud Services layer also powers our administrative control panel and the observability system for your Virtual Cluster metrics. This Cloud Services layer is an isolated deployment per region, as are the Virtual Clusters underneath it.