Configuring Kafka Client ID Features

WarpStream uses the client id setting that you set on Apache Kafka clients to control how certain features are activated

Note that configuring the client id is optional, and that if you set the client id to a string that WarpStream does not recognize, no optional feature will be enabled.

WarpStream parses the Apache Kafka client id for key value pairs that are separated with commas. Each key value pair is defined as key=value

This means that WarpStream parses the following Apache Kafka client id:

a=b,c=d,rest_of_the_client_id

into two key/value pairs: key: a, value: b and key:c, value: d

These keys/values pair are referred to as "client id features" and are used to implement WarpStream-specific functionality while still remaining within the confines of the Kafka protocol.

Available Features

warpstream_az

kgo.NewClient(...,
    kgo.ClientID("warpstream_az=us-east-1a"),
)

The warpstream_az feature enables zone-aware routing to eliminate inter-zone networking costs. The example above creates a new Apache Kafka client that will tell the WarpStream service discovery system that it's running in the us-east-1a availability zone.

The service discovery system will use this information to route the client to Agents in the same zone to eliminate inter-zone networking fees.

Note however that the service discovery system favors availability, so if it cannot find Agents in specified availability zone, it will direct the clients to Agents that are in other zones.

Accepted aliases:

  • warpstream_az

  • ws_az

warpstream_proxy_target

kgo.NewClient(...,
    kgo.ClientID("warpstream_proxy_target=proxy-consume"),
)

The warpstream_proxy_target feature is used in conjunction with WarpStream's Agent Roles functionality to route individual Kafka clients to Agents that are running specific roles.

If you have split your Agents into distinct roles (as described here), you should configure your Apache Kafka Client by setting the warpstream_proxy_target feature to either proxy-produce (this client will connect to the nodes with the proxy-produce role) or to proxy-consume (and this client will connect to the nodes with the proxy-consume role).

Accepted aliases:

  • warpstream_proxy_target

  • ws_proxy_target

  • ws_pt

warpstream_agent_group

kgo.NewClient(...,
    kgo.ClientID("warpstream_agent_group=internal"),
)

The warpstream_agent_group feature is used in conjunction with WarpStream's Agent Groups functionality to route individual Kafka clients to Agents that belong to a specific group.

If you have split your Agents into multiple groups (as described here), you can configure your Apache Kafka client by setting the warpstream_agent_group feature to the name of the Agent group you want target.

Accepted aliases:

  • warpstream_agent_group

  • ws_agent_group

  • ws_ag

warpstream_partition_assignment_strategy

kgo.NewClient(...,
    kgo.ClientID("warpstream_partition_assignment_strategy=single_agent"),
)

WarpStream Agents are stateless, therefore it is possible to direct your Kafka clients writes or reads to any Agent in the cluster. That said, the WarpStream service discovery system still does have to pick which Agents to route individual clients to. WarpStream supports a number of different strategies for handling this routing.

Accepted aliases:

  • warpstream_partition_assignment_strategy

  • ws_partition_assignment_strategy

  • ws_pas

single_agent (default strategy)

The default strategy is called single_agent, and highly recommended for the vast majority of scenarios. This strategy is called single agent because it selects a single WarpStream Agent for each Kafka client connection to act as the "leader" for all partitions in the cluster. In other words, each client perceives a view of the cluster where one of the Agents is responsible for all of the data in the cluster, and therefore the client will issue all of its Produce and Fetch requests to that one Agent.

This is best general purpose strategy because it achieves the lowest Produce latency (Produce requests for all partitions are routed to a single Agent, so clients are only exposed to the outlier latency of a single Agent instead of multiple), and also has the most even load-balancing because it's much easier to balance load when the system has full freedom to route any client to any Agent.

equal_spread

Another partition assignment strategy that WarpStream supports is called equal_spread. This strategy assigns partitions evenly amongst all the Agents such that each Agent is responsible for roughly an equal number of partitions.

This strategy is not recommended for general usage as it has the highest possible Produce latency of all strategies (because Producers producing to multiple partitions will be exposed to the outlier latency of multiple Agents), and because it is the most susceptible to load imbalances if partitions have skewed throughput.

However, it can be useful in some scenarios, particularly when data volumes are very high and traffic across partitions is very balanced. In that scenario, this strategy can improve performance, reduce the number of required Agents, and generally make the cluster more scaleable.

But different strategies allow you to choose what Agent is used, to minimize how many files are written to, and read from, block storage.

range_spread

range_spread is the same as equal_spread except that if you have more partitions than Agents, it will assign a contiguous range of partitions to the same Agent. For each partition i, it assigns the partition to a broker based on i / len(agents).

ws_host_override

kgo.NewClient(...,
    kgo.ClientID("ws_host_override=agent-lb.yourcompany.com"),
)

ws_host_override instructs the Kafka client to connect to the specified hostname instead of the Agent's advertised address. This is particularly useful when agents are:

  • Running behind load balancers or

  • Deployed within containers where the advertised address is not routable from the outside

For example, if your Warpstream agent is accessible via a load balancer with the DNS name agent-lb.yourcompany.com, you would set ws_host_override in your Kafka client configuration to this value.

Accepted aliases:

  • warpstream_partition_assignment_strategy

  • warpstream_hostname_override

  • ws_host_override

  • ws_ho

  • ws_ioskh

Configuring Multiple Features

It is possible to combine these, for example

kgo.NewClient(...,
    kgo.ClientID("warpstream_proxy_target=proxy-produce,warpstream_partition_assignment_strategy=equal_spread,warpstream_az=us-east-1a,my_client_id"),
)

will configure an Apache Kafka client that

  • indicates it is in the us-east-1a zone.

  • will write different partitions to different agents to spread its load to multiple Agents

  • will connect only to the proxy-produce agents

  • also has a custom string of your choosing, unused by WarpStream, in its id.

Last updated

Was this helpful?