Reducing Infrastructure Costs

How to reduce infrastructure costs for WarpStream BYOC clusters.

Reducing Infrastructure Costs

WarpStream infrastructure costs can originate from four different sources:

  1. Networking

  2. Storage

  3. Compute

  4. Object Storage API Fees

Networking

With WarpStream, you can avoid 100% of inter-AZ networking fees by properly configuring your Kafka clients.

Unlike Apache Kafka, WarpStream Agents will never manually replicate data across availability zones, but Kafka producer/consumer clients can still connect cross-zone, resulting in inter-zone networking fees.

This happens because by default WarpStream has no way of knowing which availability zone the client is connecting from. To avoid this issue, configure your Kafka clients to announce what availability zone they're running in using a client ID feature, and WarpStream will take care of zonally aligning your Kafka clients (for both Produce and Fetch requests) resulting in almost zero inter-zone networking fees.

Storage

WarpStream uses object storage as the primary and only storage in the system. As a result, storage costs in WarpStream tend to be more than an order of magnitude lower in WarpStream than they are in Apache Kafka. Storage costs can be reduced even further by configuring the WarpStream Agents to store data compressed using ZSTD instead of LZ4. Check out our compression documentation for more details.

Compute

The easiest way to reduce WarpStream Agent compute costs is to auto-scale the Agents based on CPU usage. This feature is built-in to our Helm Chart for Kubernetes.

Object Storage API Fees

WarpStream's entire storage engine is designed around minimizing object storage API fees as much as possible. This is accomplished with a file format that can store data for many different topic-partitions, as well as heavy usage of buffering, batching, and caching in the Agents.

The most expensive source of object storage API fees in WarpStream are the PUT requests required to create files as a result of Produce requests. By default, the WarpStream Agents will buffer data in-memory until one of the following two events occur:

  • The batch timeout elapses

    1. Default value: 250ms

    2. Agent flag: -batchTimeout

    3. Agent environment variable: WARPSTREAM_BATCH_TIMEOUT

  • A sufficient amount of uncompressed bytes is accumulated

    1. Default value: 4MiB

    2. Agent flag: batchMaxSizeBytes

    3. Agent environment variable: WARPSTREAM_BATCH_MAX_SIZE_BYTES

at which point the Agent will flush a file to the object store and then acknowledge the Produce request as a success back to the client.

To determine how much uncompressed data is stored in the files your Agents are creating for Produce requests, check the average value of the metric: warpstream_agent_segment_batcher_flush_file_size_uncompressed_bytes

If this value is less than the value of batchMaxSizeBytes, then your PUT request costs can be reduced by increasing the amount uncompressed data that is written to each file, resulting in fewer total files being created. There are two ways to accomplish this:

  1. Reduce the number of Agents that are handling Produce requests. This can be accomplished by running a smaller number of Agents, but using larger instance types. Alternatively, you can use the Agent Roles feature to split Producer / Consumer Agents.

  2. Increase the value of batchTimeout. For example, if the average uncompressed size of files created by your Agents is 2MiB, then doubling the batch timeout from 250ms to 500ms should double the uncompressed file size to 4MiB and cut the number of PUT requests in half. The downside of this approach though is that it will increase the latency of Produce requests.

Once the average size of your uncompressed files is approaching the value of batchMaxSizeBytes then you can increase the value of batchMaxSizeBytes and repeat the steps above to further reduce PUT request costs.

Last updated