Write Path
The following page is a summary of WarpStream's approach to persisting data produced from clients.
Last updated
The following page is a summary of WarpStream's approach to persisting data produced from clients.
Last updated
For a more in-depth discussion of the WarpStream write path, refer to Unlocking Idempotency with Retroactive Tombstones on the WarpStream blog.
The WarpStream write path is designed to be cost-efficient, especially for higher-throughput workloads (often 5-10x more than Apache Kafka), with performance characteristics that are aligned with the requirements of real-time data systems.
At the same time, WarpStream was also designed to be an order of magnitude easier to operate than Apache Kafka. It accomplishes this by leveraging object storage directly, with no intermediary disks.
However, keeping both costs and latency low while leveraging object storage as the only storage is a tall order. The rest of this document will explain the architectural decisions that make this possible.
There are three key aspects to WarpStream's write path:
WarpStream automatically aligns producer clients with Agents running in the same Availability Zone as the producer.
Every WarpStream Agent can write data for any topic-partition (there are no topic-partition leaders).
Data is always stored durably in object storage and committed to the WarpStream control plane before successfully acknowledging a produce request.
This design avoids 100% of cross-AZ data transfer fees because producer clients never have to write data to Agents in another AZ, and replication is handled by object storage. This results in a dramatically better cost profile and higher durability guarantees than Apache Kafka.
WarpStream Agents buffer Produce()
requests from multiple producer clients and partitions, and then write these records in batches to object storage. By default, the Agents will buffer data for 250ms, or until 8 MiB of data has been accumulated, whichever comes first.
After the file is written to object storage, the Agent commits the file metadata to the WarpStream Metadata Store, and then acknowledges all of the Produce()
requests in the batch back to the clients. WarpStream never acknowledges writes until data is durably persisted in object storage and committed to the metadata store.
Unlike Apache Kafka, WarpStream does not use a naive file-per-partition strategy as that would require writing many small files to object storage or buffering data in-memory for an unacceptably long period of time. Instead, the WarpStream Agents create individual files that contain data from many different topic-partitions, which keeps both costs and latency low.
Leveraging object storage directly, with no intermediary disks or WAL, greatly simplifies WarpStream's architecture and operational requirements:
Partitions never have to be rebalanced.
The WarpStream Agents can be instantly scaled up and back down with zero data shuffling.
Disruption or loss of even 100% of the WarpStream Agents will result in loss of 0 acknowledged data.
In the background, WarpStream Agents compact files in object storage, which enables merging batches for the same topic-partition in different files together, improving IO access patterns for historical replays. It also provides the opportunity to reorganize the data to improve locality by topic-partition.
See the documentation of the read path to learn how WarpStream avoids making a large number of GET requests to object storage APIs, regardless of the number of partitions and consumers.
Also, refer to Minimizing S3 API Costs with Distributed map on our blog for a detailed discussion of WarpStream's cost and performance optimizations using object storage primitives.
Produce requests are not acknowledged until the data is persisted in the object store and the metadata is committed to WarpStream's metadata store. This is similar to Kafka's acks=all semantics, however cloud object storage systems such as Amazon S3 have much better durability guarantees than what can be accomplished with Apache Kafka. By eliminating local storage, WarpStream is able to provide stronger durability guarantees than what can be achieved using triply replicated SSDs.
WarpStream maintains the exact same ordering guarantees and idempotency guarantees as Apache Kafka. As with Kafka, messages produced to a specific topic partition in WarpStream are appended to the log in the order that they are sent, and consumers will read the messages in the order that they are stored in the log.
To maintain ordering within a topic-partition, while still enabling a single topic-partition to be appended to from dozens of different Agents, WarpStream determines the order of writes upon committing a batch to the WarpStream Metadata Store, not when the data is flushed to object storage. This approach enables the Agents to massively parallelize writes, while still maintaining the same ordering and idempotency guarantees as Apache Kafka.
To learn more about how this works, check out the Unlocking Idempotency with Retroactive Tombstones blog post.