Instance Type Selection

Instance Selection

While the WarpStream Agent doesn't store data on the local disk, it does use the network heavily. Therefore we recommend using network-optimized cloud instances that provide at least 4GiB of RAM per vCPU. In AWS, we think the m5n and m6in series are a great choice for running the Agent.

We recommend running the Agent with at least 2 vCPUs available and providing at least 4 GiB of RAM per vCPU, therefore the m5n.large and m6in.large are the minimum recommended instance sizes in AWS.

Using much larger instances is fine as well; just make sure to set the value of GOMAXPROCS to ensure the Agent can make use of all the available cores even when running in a containerized environment.

Network Optimized Instances

The Agent does a lot of networking to service Apache Kafka Produce and Fetch requests, as well as perform background compaction. The Agent uses compression and intelligent caching to minimize this, but fundamentally, WarpStream is a data-intensive system that is even more networking-heavy than Apache Kafka due to reliance on remote storage.

Debugging latency caused networking bottlenecks and throttling is a nightmare in all cloud environments. None of the major clouds provide sufficient instrumentation or observability to understand why or if your VM's network is being throttled. Some have dynamic throttling policies that allow long bursts but suddenly degrade with no explanation.

For all of these reasons, we recommend running the WarpStream Agent on network-optimized instances, which allows the Agent to saturate the CPU before saturating the network interface. That situation is easier to understand, observe, and auto-scale.

Auto-Scaling

When running the Agent on the appropriate instance type as described above, we recommend auto-scaling based on CPU usage with a target of 50% average usage. Our internal testing workload runs the Agent at more than 75% CPU usage with little latency degradation, but choosing an appropriate threshold requires balancing the concerns of cost efficiency and responsiveness to bursts of traffic that happen faster than your auto-scaler can react.

Last updated

Logo

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. Kinesis is a trademark of Amazon Web Services.