Benchmarking

How to Benchmark WarpStream.

If you prefer to skip benchmarking WarpStream yourself, you can read our public benchmarking blog post where we provide detailed WarpStream benchmarks and TCO analysis.

The most important thing to consider when benchmarking WarpStream is that because WarpStream is a higher latency system than Apache Kafka, your Kafka client settings must be tuned appropriately to work with WarpStream to achieve high throughput. Start by reading our "Tuning Kafka Clients for Performance" documentation.

Ideally, benchmarking is performed with a real application running in in a pre-prod environment, or by teeing traffic from a production workload to WarpStream. However, we also understand that many people like to begin the evaluation process with simple synthetic benchmarks so the rest of this document is focused on how to do that correctly.

WarpStream Benchmark Tools

WarpStream has built-in tools to run Producer and Consumer benchmarks against any compatible Kafka cluster. These tools were added in the v651 and v652 WarpStream releases.

These benchmark tools can be ran against any Kafka API compatible product so you can easily compare performance against your existing Kafka infrastructure.

These benchmark tools are tuned using our Tuning for Performance guide. While these tools are tuned for a WarpStream Cluster they will work without issue against other Kafka API compatible products.

Producer Benchmark CLI

Example

This is an example of running the producer benchmark tool against a local playground WarpStream Cluster with a single client.

$ warpstream cli-beta benchmark-producer -topic ws-benchmark -num-clients 1

46798 records sent (446.00 MiB), 9359.60 records/sec (89.26 MiB/sec), 186.476ms min latency, 300.85209ms avg latency, 557.667ms max latency, 2842 buffered records.
49441 records sent (471.00 MiB), 9888.20 records/sec (94.30 MiB/sec), 192.584ms min latency, 257.174298ms avg latency, 392.739ms max latency, 2926 buffered records.
49734 records sent (474.00 MiB), 9946.80 records/sec (94.86 MiB/sec), 189.176ms min latency, 250.716986ms avg latency, 316.025ms max latency, 2914 buffered records.
49613 records sent (473.00 MiB), 9922.60 records/sec (94.63 MiB/sec), 186.951ms min latency, 280.346658ms avg latency, 480.603ms max latency, 2890 buffered records.
49032 records sent (467.00 MiB), 9806.40 records/sec (93.52 MiB/sec), 182.87ms min latency, 265.414339ms avg latency, 403.456ms max latency, 2925 buffered records.
49597 records sent (472.00 MiB), 9919.40 records/sec (94.60 MiB/sec), 187.95ms min latency, 265.404274ms avg latency, 484.618ms max latency, 2892 buffered records.
49626 records sent (473.00 MiB), 9925.20 records/sec (94.65 MiB/sec), 185.409ms min latency, 250.42801ms avg latency, 311.784ms max latency, 2970 buffered records.

Technical Details

The produce benchmark uses the franz-go Kafka library with the following configuration:

opts = append(opts, kgo.DefaultProduceTopic(c.topic))
opts = append(opts, kgo.MetadataMaxAge(60*time.Second))
opts = append(opts, kgo.MaxBufferedRecords(1_000_000))
opts = append(opts, kgo.ProducerBatchMaxBytes(int32(c.producerMaxBytes)))
opts = append(opts, kgo.RecordPartitioner(kgo.UniformBytesPartitioner(1_000_000, false, false, nil)))
opts = append(opts, kgo.ProduceRequestTimeout(c.produceRecordTimeout))

if c.disableIdempotentWrite {
	opts = append(opts, kgo.DisableIdempotentWrite())
}

This configuration is similar configuration that we recommend for the best performance.

Usage

$ warpstream cli-beta benchmark-producer --help

Usage of benchmark-producer:
  -bootstrap-host string
    	kafka bootstrap host (default "localhost")
  -bootstrap-port int
    	kafka bootstrap port (default 9092)
  -client-id string
    	client-id to pass along to kafka (default "warpstream-cli")
  -disable-idempotent-write
    	disables idempotent write, WARNING: this may cause poor performance as this sets the maximum in-flight requests to 1
  -enable-tls
    	dial with TLS or not
  -kafka-log-level string
    	the log level to set on the kafka client, accepted values are DEBUG, INFO, WARN, ERROR (default "WARN")
  -max-records-per-second int
    	maximum number of records per second to produce per kafka client (default 10000)
  -num-clients int
    	number of kafka clients (default 3)
  -num-records int
    	number of messages to produce, -1 for unlimited. (default 1000000)
  -produce-record-timeout duration
    	maximum amount of time to wait for a record to be produced (default 10s)
  -producer-max-bytes int
    	upper bounds the size of a record batch, this mirrors Kafka's max.message.bytes. (default 16000000)
  -prometheus-port int
    	the port to serve promethes metrics on, -1 to disable (default 8081)
  -record-size int
    	message size in bytes (default 10000)
  -sasl-password string
    	password for SASL authentication
  -sasl-scram
    	uses sasl scram authentication (sasl plain by default)
  -sasl-username string
    	username for SASL authentication
  -tls-client-cert-file string
    	path to the X.509 certificate file in PEM format for the client
  -tls-client-key-file string
    	path to the X.509 private key file in PEM format for the client
  -tls-server-ca-cert-file string
    	path to the X.509 certificate file in PEM format for the server certificate authority. If not specified, the host's root certificate pool will be used for server certificate verification.
  -topic string
    	the topic to produce to

Consumer Benchmark CLI

Example

This is an example of running the consumer benchmark tool against a local playground WarpStream Cluster with a single client. The consumer is consuming data in real-time that is being produced from the producer benchmark tool.

Note: End to End latency can only be calculated when consuming data in real-time that was produced using the WapStream producer benchmark tool.

$ warpstream cli-beta benchmark-consumer -topic ws-benchmark -num-clients 1

45784 records consumed (436.00 MiB), 9156.80 records/sec (87.33 MiB/sec), 305.785ms min e2e latency, 423.749632ms avg e2e latency, 552.51ms max e2e latency.
49570 records consumed (472.00 MiB), 9914.00 records/sec (94.55 MiB/sec), 245.006ms min e2e latency, 437.278189ms avg e2e latency, 649.385ms max e2e latency.
49070 records consumed (467.00 MiB), 9814.00 records/sec (93.59 MiB/sec), 238.257ms min e2e latency, 428.520332ms avg e2e latency, 628.591ms max e2e latency.
49550 records consumed (472.00 MiB), 9910.00 records/sec (94.51 MiB/sec), 229.308ms min e2e latency, 445.467432ms avg e2e latency, 642.138ms max e2e latency.
49663 records consumed (473.00 MiB), 9932.60 records/sec (94.72 MiB/sec), 307.591ms min e2e latency, 422.433481ms avg e2e latency, 539.169ms max e2e latency.
49697 records consumed (473.00 MiB), 9939.40 records/sec (94.79 MiB/sec), 310.196ms min e2e latency, 425.985862ms avg e2e latency, 620.136ms max e2e latency.
49678 records consumed (473.00 MiB), 9935.60 records/sec (94.75 MiB/sec), 307.071ms min e2e latency, 435.650686ms avg e2e latency, 658.087ms max e2e latency.

Technical Details

The produce benchmark uses the franz-go Kafka library with the following configuration:

opts = append(opts, kgo.ConsumeTopics(c.topic))
opts = append(opts, kgo.MetadataMaxAge(60*time.Second))
opts = append(opts, kgo.FetchMaxBytes(int32(c.fetchMaxBytes)))
if c.fetchMaxBytes > 50_000_000 {
	opts = append(opts, kgo.BrokerMaxReadBytes(int32(c.fetchMaxBytes*2)))
}
opts = append(opts, kgo.FetchMaxPartitionBytes(25_000_000))
opts = append(opts, kgo.FetchMaxWait(10*time.Second))
if c.fromBeginning {
	opts = append(opts, kgo.ConsumeResetOffset(kgo.NewOffset().AtStart()))
} else {
	opts = append(opts, kgo.ConsumeResetOffset(kgo.NewOffset().AtEnd()))
}
if c.consumerGroup != "" {
	opts = append(opts, kgo.ConsumerGroup(c.consumerGroup))
}

This configuration is similar configuration that we recommend for the best performance.

Usage

$ warpstream cli-beta benchmark-producer --help

Usage of benchmark-consumer:
  -bootstrap-host string
    	kafka bootstrap host (default "localhost")
  -bootstrap-port int
    	kafka bootstrap port (default 9092)
  -client-id string
    	client-id to pass along to kafka (default "warpstream-cli")
  -consumer-group string
    	the consumer group to use to consume, if unset (default) no consumer group is used
  -enable-tls
    	dial with TLS or not
  -fetch-max-bytes int
    	the maximum amount of bytes a broker will try to send during a fetch, this corresponds to the java fetch.max.bytes setting (default 50000000)
  -fetch-max-partition-bytes int
    	the maximum amount of bytes that will be consumed for a single partition in a fetch request, this corresponds to the java max.partition.fetch.bytes setting (default 25000000)
  -from-beginning
    	start with the earliest message present in the topic partition rather than the latest message, when enabled e2e latency can't be calculated
  -kafka-log-level string
    	the log level to set on the kafka client, accepted values are DEBUG, INFO, WARN, ERROR (default "WARN")
  -num-clients int
    	number of kafka clients (default 3)
  -prometheus-port int
    	the port to serve promethes metrics on, -1 to disable (default 8082)
  -sasl-password string
    	password for SASL authentication
  -sasl-scram
    	uses sasl scram authentication (sasl plain by default)
  -sasl-username string
    	username for SASL authentication
  -tls-client-cert-file string
    	path to the X.509 certificate file in PEM format for the client
  -tls-client-key-file string
    	path to the X.509 private key file in PEM format for the client
  -tls-server-ca-cert-file string
    	path to the X.509 certificate file in PEM format for the server certificate authority. If not specified, the host's root certificate pool will be used for server certificate verification.
  -topic string
    	the topic to consume from

Helm Chart

A Helm Chart is available here to deploy these tools into Kubernetes.

The benchmark configuration can be changed by modifying the following values

topicName: "ws-benchmark"
bootstrapHost: "localhost"
bootstrapPort: 9092

consumer:
  enabled: true
  replicaCount: 1
  numClients: 3

  fetchMaxBytes: 50000000
  fetchMaxPartitionBytes: 25000000

producer:
  enabled: true
  replicaCount: 1
  numClients: 3

  recordSize: 10000
  maxRecordsPerSecond: 10000
  producerMaxBytes: 16000000

Prometheus Metrics

Both the producer and consumer benchmark tools expose prometheus metrics on port 8081 and 8082 respectively and expose various metrics.

Some important metrics to monitor and example queries and graphs are bellow.

Note: The example graphs are of benchmark results from running a WarpStream Playground on a laptop and running the benchmark tooling on the same machine. WarpStream Playgrounds are heavily rate limited and are not tuned for performance. When running benchmarks we recommend running them against a production like setup using real WarpStream Agents.

Produce Throughput

Measure the total amount of Bytes Produced.

Metric: franz_go_produce_bytes_total

Example Query: sum(rate(franz_go_produce_bytes_total[1m])) - Measure the per second throughput. The higher the number the more data the benchmark is producing with a higher number equaling better performance.

Produce Buffered Records

Measure the total number of records that are being buffered.

Metric: franz_go_buffered_produce_records_total

Example Query: sum(rate(franz_go_buffered_produce_records_total[1m])) - Measure the per second buffered rate. The higher the number the more records that are being buffered. If this is increasing over time your benchmark is producing faster then your Kafka cluster can handle.

Produce Latency

Measure the amount of latency to produce a record, measured from the time the record is added to a batch until an ACK is returned from the Kafka Cluster.

Metric: warpstream_produce_benchmark_produce_request_duration_seconds_bucket

Example Query: warpstream_produce_benchmark_produce_request_duration_seconds_bucket - Measure the P90 Latency of producing records. The higher the number the more latency there is.

End to End Consume Latency

Measure the amount of latency to produce a record and to consume the same record. Measured from the time the record is created in memory in the producer benchmark to the time a consumer fetches and starts processing the record from the Kafka cluster.

Note: End to End latency can only be calculated when consuming data in real-time that was produced using the WapStream producer benchmark tool.

Metric: warpstream_consume_benchmark_e2e_consume_duration_seconds_bucket

Example Query: histogram_quantile(0.90, sum by(le) (rate(warpstream_consume_benchmark_e2e_consume_duration_seconds_bucket[1m]))) - Measure the P90 End to End Latency of producing and consuming the same record. The higher the number the more End to End latency there is.

Kafka Benchmark tools

While we recommend using WarpStream benchmark tooling to perform your synthetic benchmarks you can use any benchmark tool including the native Kafka ones.

Other benchmark tools may need to be tuned to get the best performance out of WarpStream. See out Tuning for Performance guide.

Due to the nature of how the Java Kafka protocol is implemented, you'll most likely struggle to achieve more than 60-100MiB/s of producer traffic from a single instance of the kafka perf testing tooling. However, once you've found configuration that you're happy with, you can increase the total throughput of the benchmark by running multiple instances of kafka-producer-perf-test.sh concurrently.

On the contrast, the WarpStream benchmark tooling can achieve multi-gigabyte per second producer traffic within a single instance if given enough cpu, memory, and network bandwidth.

kafka-producer-perf-test.sh

One of the most common utilities for performing synthetic benchmarks of Kafka clusters is the kafka-producer-perf-test.sh utility. This utility embeds a native Java Kafka client, so it should be tuned according to our recommend settings. For example:

kafka-producer-perf-test.sh --print-metrics --producer-props bootstrap.servers=$BROKERS enable.idempotence=false compression.type=lz4 linger.ms=25 batch.size=10000000 buffer.memory=128000000 max.request.size=64000000 metadata.max.age.ms=60000 --record-size 10000 --topic "test" --throughput 1000 --num-records 1000000

The settings above are just a starting point, you'll want to slowly increase the values of throughput and num-records as you perform your testing. More importantly, you'll have to consider how many partitions the test topic you're producing to has.

If the topic you're producing to has many partitions, you may need to reduce the value of batch.size to prevent the producer utility from OOMing. If the topic you're producing to has less partitions, then you may need to increase the value of batch.size instead to achieve higher throughput.

Running multiple instances of kafka-producer-perf-tesh.sh is highly recommended because load-balancing in WarpStream works differently than it does in Apache Kafka. Specifically, Apache Kafka balances partitions across Brokers, whereas WarpStream (due to its stateless nature) balances client connections across Agents.

As a result, a single instance of kafka-producer-perf-test.sh will generally route all of its traffic to a single WarpStream Agent. However, if you run multiple instances of the benchmarking utility concurrently, you'll see the traffic begin to spread evenly amongst all your deployed Agents.

PreviousAWS Marketplace NextCompression

Last updated 1 month ago

Was this helpful?