Benchmarking
How to Benchmark WarpStream.
The most important thing to consider when benchmarking WarpStream is that because WarpStream is a higher latency system than Apache Kafka, your Kafka client settings must be tuned appropriately to work with WarpStream to achieve high throughput. Start by reading our "Tuning Kafka Clients for Performance" documentation.
Ideally, benchmarking is performed with a real application running in in a pre-prod environment, or by teeing traffic from a production workload to WarpStream. However, we also understand that many people like to begin the evaluation process with simple synthetic benchmarks so the rest of this document is focused on how to do that correctly.
WarpStream Benchmark Tools
WarpStream has built-in tools to run Producer and Consumer benchmarks against any compatible Kafka cluster. These tools were added in the v651 and v652 WarpStream releases.
These benchmark tools can be ran against any Kafka API compatible product so you can easily compare performance against your existing Kafka infrastructure.
These benchmark tools are tuned using our Tuning for Performance guide. While these tools are tuned for a WarpStream Cluster they will work without issue against other Kafka API compatible products.
Producer Benchmark CLI
Example
This is an example of running the producer benchmark tool against a local playground WarpStream Cluster with a single client.
Technical Details
The produce benchmark uses the franz-go Kafka library with the following configuration:
This configuration is similar configuration that we recommend for the best performance.
Usage
Consumer Benchmark CLI
Example
This is an example of running the consumer benchmark tool against a local playground WarpStream Cluster with a single client. The consumer is consuming data in real-time that is being produced from the producer benchmark tool.
Note: End to End latency can only be calculated when consuming data in real-time that was produced using the WapStream producer benchmark tool.
Technical Details
The produce benchmark uses the franz-go Kafka library with the following configuration:
This configuration is similar configuration that we recommend for the best performance.
Usage
Helm Chart
A Helm Chart is available here to deploy these tools into Kubernetes.
The benchmark configuration can be changed by modifying the following values
Prometheus Metrics
Both the producer and consumer benchmark tools expose prometheus metrics on port 8081 and 8082 respectively and expose various metrics.
Some important metrics to monitor and example queries and graphs are bellow.
Note: The example graphs are of benchmark results from running a WarpStream Playground on a laptop and running the benchmark tooling on the same machine. WarpStream Playgrounds are heavily rate limited and are not tuned for performance. When running benchmarks we recommend running them against a production like setup using real WarpStream Agents.
Produce Throughput
Measure the total amount of Bytes Produced.
Metric: franz_go_produce_bytes_total
Example Query: sum(rate(franz_go_produce_bytes_total[1m]))
- Measure the per second throughput. The higher the number the more data the benchmark is producing with a higher number equaling better performance.
Produce Buffered Records
Measure the total number of records that are being buffered.
Metric: franz_go_buffered_produce_records_total
Example Query: sum(rate(franz_go_buffered_produce_records_total[1m]))
- Measure the per second buffered rate. The higher the number the more records that are being buffered. If this is increasing over time your benchmark is producing faster then your Kafka cluster can handle.
Produce Latency
Measure the amount of latency to produce a record, measured from the time the record is added to a batch until an ACK is returned from the Kafka Cluster.
Metric: warpstream_produce_benchmark_produce_request_duration_seconds_bucket
Example Query: warpstream_produce_benchmark_produce_request_duration_seconds_bucket
- Measure the P90 Latency of producing records. The higher the number the more latency there is.
End to End Consume Latency
Measure the amount of latency to produce a record and to consume the same record. Measured from the time the record is created in memory in the producer benchmark to the time a consumer fetches and starts processing the record from the Kafka cluster.
Note: End to End latency can only be calculated when consuming data in real-time that was produced using the WapStream producer benchmark tool.
Metric: warpstream_consume_benchmark_e2e_consume_duration_seconds_bucket
Example Query: histogram_quantile(0.90, sum by(le) (rate(warpstream_consume_benchmark_e2e_consume_duration_seconds_bucket[1m])))
- Measure the P90 End to End Latency of producing and consuming the same record. The higher the number the more End to End latency there is.
Kafka Benchmark tools
While we recommend using WarpStream benchmark tooling to perform your synthetic benchmarks you can use any benchmark tool including the native Kafka ones.
Other benchmark tools may need to be tuned to get the best performance out of WarpStream. See out Tuning for Performance guide.
Due to the nature of how the Java Kafka protocol is implemented, you'll most likely struggle to achieve more than 60-100MiB/s of producer traffic from a single instance of the kafka perf testing tooling. However, once you've found configuration that you're happy with, you can increase the total throughput of the benchmark by running multiple instances of kafka-producer-perf-test.sh concurrently.
On the contrast, the WarpStream benchmark tooling can achieve multi-gigabyte per second producer traffic within a single instance if given enough cpu, memory, and network bandwidth.
kafka-producer-perf-test.sh
One of the most common utilities for performing synthetic benchmarks of Kafka clusters is the kafka-producer-perf-test.sh utility. This utility embeds a native Java Kafka client, so it should be tuned according to our recommend settings. For example:
The settings above are just a starting point, you'll want to slowly increase the values of throughput and num-records as you perform your testing. More importantly, you'll have to consider how many partitions the test topic you're producing to has.
If the topic you're producing to has many partitions, you may need to reduce the value of batch.size to prevent the producer utility from OOMing. If the topic you're producing to has less partitions, then you may need to increase the value of batch.size instead to achieve higher throughput.
Running multiple instances of kafka-producer-perf-tesh.sh is highly recommended because load-balancing in WarpStream works differently than it does in Apache Kafka. Specifically, Apache Kafka balances partitions across Brokers, whereas WarpStream (due to its stateless nature) balances client connections across Agents.
As a result, a single instance of kafka-producer-perf-test.sh will generally route all of its traffic to a single WarpStream Agent. However, if you run multiple instances of the benchmarking utility concurrently, you'll see the traffic begin to spread evenly amongst all your deployed Agents.
Last updated
Was this helpful?