Configuring Kafka Client ID Features

WarpStream uses the client id setting that you set on Apache Kafka clients to control how certain features are activated

Note that configuring the client id is optional, and that if you set the client id to a string that WarpStream does not recognize, no optional feature will be enabled.

WarpStream parses the Apache Kafka client id for key value pairs that are separated with commas. Each key value pair is defined as key=value

This means that WarpStream parses the following Apache Kafka client id:

a=b,c=d,rest_of_the_client_id

into two key/value pairs: key: a, value: b and key:c, value: d

These keys are recognized by WarpStream:

  • warpstream_az indicates to WarpStream that this client runs on the zone provided in the value

  • warpstream_partition_assignment_strategy indicates to

    WarpStream how you would like to spread the partitions you read from or write to on different agents

  • warpstream_proxy_target indicates what agent roles you want to target when reading data from or writing data to WarpStream

warpstream_az

kgo.NewClient(...,
    kgo.ClientID("warpstream_az=us-east-1a"),
)

will create a new Apache Kafka client that will indicate it's in the us-east-1a zone. It will attempt to talk to Agents in the same zone to minimize interzone bandwidth costs if there are some available. However WarpStream favors availability so if it cannot find Agents in that zone, it will direct the clients to Agents that are in other zones.

ws_host_override

kgo.NewClient(...,
    kgo.ClientID("ws_host_override=agent-lb.yourcompany.com"),
)

will instruct the Kafka client to connect to the specified hostname instead of the agent's advertised address. This is particularly useful when agents are: 1) running behind load balancers or 2) deployed within containers where the advertised address is not routable from the outside

For example, if your Warpstream agent is accessible via a load balancer with the DNS name agent-lb.yourcompany.com, you would set ws_host_override in your Kafka client configuration to this value.

warpstream_partition_assignment_strategy

kgo.NewClient(...,
    kgo.ClientID("warpstream_partition_assignment_strategy=single_agent"),
)

Since Agents are stateless, it is possible to direct your reads or your writes to any Agent in the pool. But different strategies allow you to choose what Agent is used, to minimize how many files are written to, and read from, block storage.

  • "warpstream_partition_assignment_strategy=single_agent" means your Apache Kafka client will talk to the same Agent to read from or write to a single topic. It is the default value.

  • "warpstream_partition_assignment_strategy=equal_spread" means your Apache Kafka client will talk to all Agents in the pool to write to all partition of a topic, spreading the writes to each partition on a different client. In other words, distributes partitions in a round-robin fashion, ensuring an even spread across all brokers. For each partition i, it assigns the partition to a broker based on the result of i % len(agents).

  • "warpstream_partition_assignment_strategy=range_spread" is the same as "equal_spread" except that if you have more partitions than Agents, it will assign a contiguous range of partitions to the same Agent. For each partition i, it assigns the partition to a broker based on i / len(agents).

warpstream_proxy_target

kgo.NewClient(...,
    kgo.ClientID("warpstream_proxy_target=proxy-consume"),
)

If you have split your pool of Agents using Agent roles (as described here), you should configure your Apache Kafka Client by setting warpstream_proxy_target to either proxy-produce (this client will connect to the nodes with the proxy-produce role) or to proxy-consume (and this Agent will connect to the nodes with the proxy-consume role).

setting multiple options

It is possible to combine these, for example

kgo.NewClient(...,
    kgo.ClientID("warpstream_proxy_target=proxy-produce,warpstream_partition_assignment_strategy=equal_spread,warpstream_az=us-east-1a,my_client_id"),
)

will configure an Apache Kafka client that

  • indicates it is in the us-east-1a zone.

  • will write different partitions to different agents to spread its load to multiple Agents

  • will connect only to the proxy-produce agents

  • also has a custom string of your choosing, unused by WarpStream, in its id.

Last updated