> For the complete documentation index, see [llms.txt](https://docs.warpstream.com/warpstream/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.warpstream.com/warpstream/kafka/reference/control-plane-utilization.md).

# Control Plane Utilization

## WarpStream Control Plane Utilization

Every WarpStream cluster gets a dedicated control plane that is fully managed by the WarpStream team. Control planes are always provisioned for maximum potential capacity and cannot be scaled further even if they reach maximum utilization. When a control plane reaches maximum utilization its performance will begin to degrade and the cluster may become unavailable until utilization is reduced.

Control plane utilization is exposed via a metric called `warpstream_control_plane_utilization` (Prometheus) or `warpstream.control_plane_utilization` (Datadog). The value is exposed as a fraction between 0 and 1 where 0 means 0% utilization and 1 means 100% utilization.

Every control plane is fully independent. This means that if you have two different clusters in your account, and one of the clusters reaches 100% utilization, the other cluster will not be impacted and vice versa.

The vast majority of workloads will never approach maximum utilization of a single WarpStream control plane. However, some extremely demanding or pathological workloads can saturate a WarpStream control plane resulting in performance degradation, or in the worst case, unavailability of the cluster.

## Processed Batches

Virtually every operation performed in a WarpStream cluster contributes to control plane utilization in some way. That said, the vast majority of control plane utilization is driven by one factor: how many batches the cluster has to process.

In WarpStream, a "batch" is a group of records that all belong to the same topic-partition. The raw data in your batches never leaves your VPC, but the amount of metadata that the control plane needs to process scales linearly with the number of batches processed by the cluster. This can be understood intuitively by considering the fact that one of the control plane's primary responsibilities is to assign offsets to records, and a batch is the minimum unit of work for which offsets can be assigned.

When Kaka's idempotency feature is **disabled**, the number of batches that must be processed is primarily a function of **the number of partitions that are actively produced to in a given time interval**. The number of partitions that are actively produced to in a given time interval is primarily a function of:

1. Record key distribution.
2. [Partition assignment strategy](/warpstream/kafka/reference/partition-assignment-strategies.md) (more on this later).
3. The number of partitions in the topics that are actively being produced to.

When Kafka's idempotency feature is **enabled**, the number of batches that must be processed is a function of all of the above **plus the number of producers that are actively producing in a given time interval**.

In order words, enabling idempotency makes the batches/s problem much worse. The reason for this is that when idempotency is disabled, the Agents can merge together batches from different producers as long as those batches belong to the same topic-partition. For many workloads, this dramatically reduces the number of batches that need processing.

When idempotency is enabled, the Agent's can't perform this merge operation and the number of batches that need to be processed may increase dramatically

The number of batches processed by your cluster is exposed via a metric called `warpstream_agent_segment_batcher_flush_num_batches` (Prometheus) or `warpstream.agent_segment_batcher_flush_num_batches` (Datadog).

The number of processed batches can also be visualized in the WarpStream UI as shown below.

<figure><img src="/files/evrPduMYQkcnPdT1HSI5" alt=""><figcaption></figcaption></figure>

If your workload starts to approach more than `80,000` batches processed per second, you should consider following the steps in the next section to reduce the number of batches.

## Reducing Processed Batches

### Configure Kafka Clients

Make sure you've configured your Kafka clients according to [our recommendations](/warpstream/kafka/configure-kafka-client/tuning-for-performance.md).

### Disable Idempotency

As discussed in the [processed batches](#processed-batches) section, Kafka's idempotency features disables the Agent's ability to merge batches of data together that belong to the same topic-partition, potentially resulting in a significantly higher number of batches to process.

Also, several of the strategies discussed below are only effective when idempotency is disabled (this is called out in the documentation when relevant).

For particularly high volume or demanding workloads, we **strongly** recommend disabling this feature.

### Reduce the Number of Active Partitions

As discussed above, the number of batches that must be processed is primarily a function of **the number of partitions that are actively produced to in a given time interval**. As a result, anything that reduces the number of partitions actively produced to will also reduce the number of batches that need to be processed.

There are four ways to reduce the number of active partitions in a given time interval:

1. [Use NULL record keys.](#use-null-record-keys)
   1. Works with idempotency enabled.
2. [Reduce the partition count in the topics that are being produced to.](#reduce-the-partition-count)
   1. Works with idempotency enabled.
3. [Increase the batch timeout in the Agent.](#reduce-the-agent-batch-timeout)
   1. Requires idempotency to be disabled.
4. [Change the partition assignment strategy.](#change-the-partition-assignment-strategy)
   1. Requires idempotency to be disabled.

#### Use NULL Record Keys

When using non-null record keys, the producer will map each record to the partition it belongs to based on the record's key. For example, consider a single producer that produces 1024 records to a topic with 256 partitions such that each partition ends up receiving 4 records on average. In a given interval, this producer will generate \~256 batches of data.

Now consider the same producer, but each record has a NULL key. When a record doesn't have a key specified, the Kafka client is free to assign that record to any partition as it sees fit. In practice, what most Kafka clients do is pick one partition, write a bunch of records to it until some threshold (like 1MiB) is reached, and then pick another partition and repeat. In that scenario, the 1024 records could end up being assigned to just one or two partitions in a given time interval, resulting in the producer generating just 1-2 batches of data in total.

The partitions will stay almost perfectly balanced in aggregate since each Kafka producer will rotate the partition they're producing to on a regular basis.

The downside of this approach is that records will be spread across all of the partitions with no consideration for specific records ending up in any particular partition which may not be acceptable for your consumers.

#### Reduce the Partition Count

{% hint style="info" %}
You can ignore this section if your workload is using NULL record keys as described in the section above.
{% endhint %}

If your workload is using non-NULL record keys then the number of batches it will generate in a given time interval is a function of:

1. The distribution of your workload's keys.
2. The number of partitions in the topic(s) being produced to.

You probably have no control over the distribution of your workload's keys, the data is the data, but you may have control over the number of partitions in each topic. Reducing this value will usually result in a \~ linear decrease in the number of batches processed by the cluster.

The downside of this approach is that it will decrease the maximum number of parallel consumers that can process a given topic because the number of consumers in a workload cannot be scaled higher than the number of topic-partitions available to distribute amongst the consumers.

#### Increase the Agent Batch Timeout

{% hint style="info" %}
This approach only works if you [disable idempotency](#disable-idempotency) because it relies on the Agent's ability to merge together batches of data that belong to the same topic-partition.
{% endhint %}

The WarpStream Agents buffer produced records in memory for the configured batch timeout (default 250ms) and then once the timeout elapses they flush one or more files containing all of the batches they received. If idempotency is disabled, then all of the batches for a given topic-partition received by a single Agent in this time interval will be merged together and presented to the control plane as a single batch.

As a result of this merge operation, increasing the Agent batch timeout gives the Agent more time to accumulate more batches from the producer clients and merge them together. Therefore, increasing the batch timeout reduces the number of batches processed by the control plane.

For example, consider a workload with 1024 partitions actively being produced to and Agents configure with a 250ms batch timeout running in three different availability zones. The minimum possible number of batches/s to be processed by the cluster then is:

`NUM_PARTITIONS * NUM_FLUSHES_PER_SECOND * NUM_AVAILABILITY_ZONES`

To make that concrete for our example: `1024 * 1000/250 * 3 == 12,288 batches/s`

However, if we increase the batch timeout from 250ms to 500ms, then the Agent has twice as long to merge together batches for the same topic-partition and the number of batches drops in half: `1024 * 1000/500 * 3 == 6,144 batches/s` .

The Agent batch timeout can be modified via the `-batchTimeout` flag or environment variable `WARPSTREAM_BATCH_TIMEOUT` . The default value is `250ms` .

The downside of this approach is that increasing the batch timeout will increase the latency of Produce requests.

#### Change the Partition Assignment Strategy

{% hint style="info" %}
This approach only works if you [disable idempotency](#disable-idempotency) because it relies on the Agent's ability to merge together batches of data that belong to the same topic-partition.
{% endhint %}

The default [partition assignment strategy](/warpstream/kafka/reference/partition-assignment-strategies.md) in WarpStream is `consistent_random_jump` which strikes a good balance between load-balancing and reducing the number of batches that must be processed. However, in many cases `consistent_random_jump` will end up spreading the load for a single topic-partition between 2-3 Agents instead of just 1 which can increase the number of batches that need to be processed by 2-3x respectively.

As a result, if you have a fairly homogenous workload where your topic-partitions are highly balanced, then switching the partition assignment strategy to [`consistent_spread`](/warpstream/kafka/reference/partition-assignment-strategies.md#consistent_spread-recommended-in-some-specific-cases) could reduce the number of batches that need to be processed significantly compared to `consistent_random_jump`.

The downside of this approach is that `consistent_spread` is not load-aware, so if your workload has significant load skew then you may end up with hotspots in the Agents resulting in degraded performance.

### Factors that Won't Help

In the past, we used to recommend several approaches for reducing the number of batches that need to be processed that are no longer relevant:

1. Increasing the size of files that the Agents are allowed to create.
2. Vertically scaling the Agents and running less Agents.

#### Increasing File Sizes

Increasing the size of files that the Agents are allowed to create used to reduce the number of batches that needed to be processed significantly, but in the latest versions of the Agent this no longer true. In the latest Agent versions (v800+) we've improved the logic such that even when the maximum allowed file size is small, the Agents split batches into files in an intelligent way that maximizes their ability to merge batches for the same topic-partition together. As a result, increasing the size of files that the Agents generate is no longer helpful for reducing the number of batches that need to be processed.

Each file does some have associated control plane overhead, so increasing the file size can still help reduce control plane utilization just by virtue of reducing the number of files that need to be processed, but the latest version of the Agents will automatically increase the maximum allow file size if they detect that the number of files being created is putting significant load on the control plane.

Increasing the maximum allowed file size is still useful for reducing object storage PUT costs.

The downside of this approach is that it increases Produce request latency.

#### Vertically Scaling and Running Less Agents

Historically, we used to recommend that customers vertically scale their Agents to reduce control plane utilization. The reason for this is that in older versions of the Agent the [default partition assignment strategy](/warpstream/kafka/reference/partition-assignment-strategies.md) was `single_agent` where the number of batches that needed to be processed scaled almost linearly with the number of deployed Agents.

In the latest Agent versions (v800+), the default partition assignment strategy is `consistent_random_jump` which does not suffer from this problem.

That said, vertically scaling the Agents and not overprovisioning can still be useful for reducing object storage PUT costs, and generally spealing larger Agents are more resilient to traffic spike and load imbalances, it just won't help much with control plane utilization except to reduce the number of files processed by the control plane which does have an effect on utilization, but a much smaller effect than the number of processed batches.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.warpstream.com/warpstream/kafka/reference/control-plane-utilization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
