# Low Latency Clusters

## Overview

By default, WarpStream is tuned for maximum throughput and minimal costs at the expense of higher latency. However, WarpStream clusters can be tuned to provide much lower Produce and End-to-End latency.

The rest of this document outlines all of the different approaches that can be taken to reduce latency. Note that all of these approaches are cumulative and the lowest possible latency is achieved by combining all of them. The table below summarizes the different approaches and their trade-offs.

|          Approach          | Reduces Produce Latency |  Reduces E2E Latency |   Full Consistency   |             Increases Costs             |
| :------------------------: | :---------------------: | :------------------: | :------------------: | :-------------------------------------: |
|    Reduce client linger    |   :white\_check\_mark:  | :white\_check\_mark: | :white\_check\_mark: |                   :x:                   |
| Reduce Agent batch timeout |   :white\_check\_mark:  | :white\_check\_mark: | :white\_check\_mark: |           :white\_check\_mark:          |
| Control Plane Cluster Tier |   :white\_check\_mark:  | :white\_check\_mark: | :white\_check\_mark: |           :white\_check\_mark:          |
|         S3 Express         |   :white\_check\_mark:  | :white\_check\_mark: | :white\_check\_mark: | :white\_check\_mark: (\~20% on average) |
|      Lightning Topics      |   :white\_check\_mark:  |          :x:         |          :x:         |                   :x:                   |

The table below shows achievable produce and E2E latencies for a variety of different setups.

|                                           Setup                                          |           Produce Latency          |             E2E Latency             |
| :--------------------------------------------------------------------------------------: | :--------------------------------: | :---------------------------------: |
|     25ms linger, 250ms batch timeout (default), S3 Standard, Fundamental cluster tier    |   <p>p50: 250ms<br>p99: 500ms</p>  |   <p>p50: 500ms<br>p99: 900ms</p>   |
|          10ms linger, 50ms batch timeout, S3 Express, Fundamentals cluster tier          | <p>p50: < 80ms<br>p99: < 150ms</p> | <p>p50: < 200ms<br>p99: < 400ms</p> |
| 10ms linger, 25ms batch timeout, S3 Express, Fundamentals cluster tier, lightning topics |  <p>p50: < 35ms<br>p99: < 50ms</p> | <p>p50: < 200ms<br>p99: < 400ms</p> |

## Configuration

### Client Linger

Before tuning WarpStream itself, first check your client configuration. The WarpStream documentation [has recommendations on how to tune various Kafka clients for maximum performance](/warpstream/kafka/configure-kafka-client/tuning-for-performance.md) with WarpStream. You should still follow all those recommendations, however, if you want to minimize cluster latency then you should consider reducing the value of `linger` in your Kafka client from our default recommendation of 100ms to 25ms or 10ms.

### Agent Batch Timeout

The WarpStream Agents accept a `-batchTimeout` (`WARPSTREAM_BATCH_TIMEOUT` environment variable) that controls how long the Agents will buffer data in-memory before flushing it to object storage. Produce requests are never acknowledged back to the client before data is durably persisted in object storage, so this option has no impact on durability or correctness, but it does directly impact the latency of Produce requests.

The default `batchTimeout` in the Agents is `250ms` , but the value can be decreased as low as `25ms` to reduce Produce latency. Lowering this value will result in higher cloud infrastructure costs because the Agents will have to create more files in object storage and will incur higher PUT request API fees as a result.

Note that [S3 Express](#s3-express) PUTs are \~ 1/5th the cost of a regular S3 PUT, so reducing your batch timeout from `250ms` to `50ms` while also switching to S3OZ would only increase your ingestion PUT request costs by 2x instead of 5x.

$$
250/50 \* (1/5) \* 2 (azs) = 2x
$$

### Control Plane Cluster Tier

Similar to the Agents, the WarpStream control plane batches some virtual cluster operations, resulting in higher latency in exchange for reduced control plane costs. Higher [cluster tiers](/warpstream/reference/billing.md#cluster-tiers) like Fundamentals and Pro batch less and thus have lower control plane latency. Switching cluster tiers is a one-click operation in the WarpStream UI or terraform.

### Lightning topics

Lightning topics are a special topic type in WarpStream where the Agents skip committing data to the control plane in the critical path of a produce request. Instead, they journal produce requests to object storage, and commit them to the control plane asynchronously.

As a result, lightning topics have significantly lower Produce request latency than regular topics, especially if you have already lowered your [batch timeout](#batch-timeout) and switched to a low latency storage backend like [S3 Express](#s3-express).

Lightning topics provide the exact same durability guarantees as regular topics (acknowled data is guaranteed to not be lost), but they do have a few caveats and relaxed consistency guarantees that you can learn more about in our [dedicated lightning topics](/warpstream/kafka/advanced-agent-deployment-options/low-latency-clusters/lightning-topics.md) documentation.

### Lower Fetch Latency for Low Volume Topics

{% hint style="info" %}
This feature requires v796+ of the Agent.
{% endhint %}

When a consumer is caught up to the tail end of a partition and there are no more records left for it to process, fetch requests for that partition need to poll the WarpStream control plane to detect when new records are available. This polling mechanism uses a custom exponential back-off mechanism that is designed to minimize load on the control plane even when tens of thousands of consumer clients are all polling simultaneously. This is the ideal configuration for the vast majority of workloads, but it can lead to a few hundred ms of additional end-to-end latency for low volume topics and partitions which is undesirable for some workloads.

If you have such a workload, you can opt into a faster, but less scalable backoff mechanism using one of two methods.

First, you can set the `-fetch-quick-retry` flag or `WARPSTREAM_FETCH_QUICK_RETRY=true` environment variable on your Agent deployment. This will automatically opt-in all consumers connected to this Agent deployment to the lower-latency polling mechanism.

Alternatively, you can set the `ws_fqr=true` [client ID feature](/warpstream/kafka/configure-kafka-client/configuring-kafka-client-id-features.md#warpstream_fetch_quick_retry) if you only want to change the behavior of a subset of applications.

{% hint style="warning" %}
This feature increases the load on the WarpStream control plane which is a finite resource for each virtual cluster. That's fine for most use-cases, but can be problematic for workloads with thousands of tends of thousands of consumer clients.

Check the value of the `warpstream_control_plane_utilization` metric before and after enabling this feature to prevent overloading the control plane.
{% endhint %}

## Low Latency Storage Backends

### S3 Express (AWS)

[S3 Express One Zone](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a tier of AWS S3 that provides much lower latency for writes and reads, but only stores the data in a single availability zone. The WarpStream Agents have native support for S3 Express and can use it to store newly written data. Combined with a reduced batch timeout, S3 express can reduce the P99 latency of Produce requests to less than 150ms.

Learn how to configure Warpstream agents to write to S3 Express One Zone [here](/warpstream/kafka/advanced-agent-deployment-options/low-latency-clusters/s3-express.md).

### Premium Blob Storage (Azure)

[Azure Premium Blob Storage](https://azure.microsoft.com/en-us/blog/premium-block-blob-storage-a-new-level-of-performance/) is a tier of Azure Blob Storage that provides much lower latency for writes and reads, as well as much cheaper blob storage API calls as well. Combined with a reduced batch timeout, this Premium tier can reduce the P99 latency of Produce requests to less than 150ms.

One downside of using Azure Premium Blob Storage is that the storage costs are 10x higher than regular blob storage buckets.

However, WarpStream can mitigate this downside automatically by landing newly produced data in a premium blob storage bucket to reduce latency, and then subsequently compacting the data into a regular blob storage bucket for long term storage.

This is a form of tiered storage where both the "hot" and "cold" storage happen to be blob storage and allows you to get the best of both worlds: low latency, low blob storage API call costs, and low storage costs.

To configure this set the `-ingestionBucketURL` or `WARPSTREAM_INGESTION_BUCKET_URL` environment variable to the bucket URL for the premium blob storage bucket and then set the `-compactionBucketURL` or `WARPSTREAM_COMPACTION_BUCKET_URL` environment variable to the bucket url for the standard blob storage bucket and WarpStream will automatically take care of minimizing storage costs for you.

## Alternative Storage Backends

In addition to S3 Express, we offer a few additional lower latency storage backends like AWS DynamoDB and Google Spanner. While useful for some applications, keep in mind tha these alternative storage backends are much more expensive than traditional object storage or S3 Express and are not suitable for high volume applications.

### AWS DynamoDB

In addition to S3 Express One Zone, AWS developers have the option to deploy their WarpStream agents using [DynamoDB](https://aws.amazon.com/dynamodb/) as the storage layer. Using DynamoDB yields latencies similar to S3 Express One Zone and generally costs less if the workload's throughput is low enough. Higher volume workloads should always prefer S3 Express One Zone over DynamoDB for cost reasons. See the [Cost Estimates](#cost-estimates) section below for more details.

Learn how to configure WarpStream agents to use AWS DynamoDB as the storage layer [here](/warpstream/kafka/advanced-agent-deployment-options/low-latency-clusters/aws-dynamo-db.md)

### Google Spanner (beta)

{% hint style="warning" %}
Google Spanner support for the data plane is available only for agents using version v709 and above
{% endhint %}

On GCP deployments, developers can choose to use [Spanner](https://cloud.google.com/spanner) as the storage layer. This is the only low-latency ingestion alternative in GCP, and offers similar tradeoffs to the DynamoDB option described above. It's also only recommended for low-throughput clusters for cost reasons. See the [Cost Estimates](#cost-estimates) section below for more details.

Learn how to configure WarpStream agents to use Google Spanner as the storage layer [here](#cost-estimates)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.warpstream.com/warpstream/kafka/advanced-agent-deployment-options/low-latency-clusters.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.