Configuring Kafka Client ID Features
WarpStream uses the client id setting that you set on Apache Kafka clients to control how certain features are activated
Note that configuring the client id is optional, and that if you set the client id to a string that WarpStream does not recognize, no optional feature will be enabled.
WarpStream parses the Apache Kafka client id for key value pairs that are separated with commas. Each key value pair is defined as key=value
This means that WarpStream parses the following Apache Kafka client id:
a=b,c=d,rest_of_the_client_id
into two key/value pairs: key: a, value: b
and key:c, value: d
These keys/values pair are referred to as "client id features" and are used to implement WarpStream-specific functionality while still remaining within the confines of the Kafka protocol.
Available Features
warpstream_az
The warpstream_az
feature enables zone-aware routing to eliminate inter-zone networking costs. The example above creates a new Apache Kafka client that will tell the WarpStream service discovery system that it's running in the us-east-1a
availability zone.
The service discovery system will use this information to route the client to Agents in the same zone to eliminate inter-zone networking fees.
Note however that the service discovery system favors availability, so if it cannot find Agents in specified availability zone, it will direct the clients to Agents that are in other zones.
Accepted aliases:
warpstream_az
ws_az
warpstream_proxy_target
The warpstream_proxy_target
feature is used in conjunction with WarpStream's Agent Roles functionality to route individual Kafka clients to Agents that are running specific roles.
If you have split your Agents into distinct roles (as described here), you should configure your Apache Kafka Client by setting the warpstream_proxy_target
feature to either proxy-produce
(this client will connect to the nodes with the proxy-produce
role) or to proxy-consume
(and this client will connect to the nodes with the proxy-consume
role).
Accepted aliases:
warpstream_proxy_target
ws_proxy_target
ws_pt
warpstream_agent_group
The warpstream_agent_group
feature is used in conjunction with WarpStream's Agent Groups functionality to route individual Kafka clients to Agents that belong to a specific group.
If you have split your Agents into multiple groups (as described here), you can configure your Apache Kafka client by setting the warpstream_agent_group
feature to the name of the Agent group you want target.
Accepted aliases:
warpstream_agent_group
ws_agent_group
ws_ag
warpstream_partition_assignment_strategy
WarpStream Agents are stateless, therefore it is possible to direct your Kafka clients writes or reads to any Agent in the cluster. That said, the WarpStream service discovery system still does have to pick which Agents to route individual clients to. WarpStream supports a number of different strategies for handling this routing.
Accepted aliases:
warpstream_partition_assignment_strategy
ws_partition_assignment_strategy
ws_pas
single_agent (default strategy)
The default strategy is called single_agent
, and highly recommended for the vast majority of scenarios. This strategy is called single agent
because it selects a single WarpStream Agent for each Kafka client connection to act as the "leader" for all partitions in the cluster. In other words, each client perceives a view of the cluster where one of the Agents is responsible for all of the data in the cluster, and therefore the client will issue all of its Produce and Fetch requests to that one Agent.
This is best general purpose strategy because it achieves the lowest Produce latency (Produce requests for all partitions are routed to a single Agent, so clients are only exposed to the outlier latency of a single Agent instead of multiple), and also has the most even load-balancing because it's much easier to balance load when the system has full freedom to route any client to any Agent.
equal_spread
Another partition assignment strategy that WarpStream supports is called equal_spread
. This strategy assigns partitions evenly amongst all the Agents such that each Agent is responsible for roughly an equal number of partitions.
This strategy is not recommended for general usage as it has the highest possible Produce latency of all strategies (because Producers producing to multiple partitions will be exposed to the outlier latency of multiple Agents), and because it is the most susceptible to load imbalances if partitions have skewed throughput.
However, it can be useful in some scenarios, particularly when data volumes are very high and traffic across partitions is very balanced. In that scenario, this strategy can improve performance, reduce the number of required Agents, and generally make the cluster more scaleable.
But different strategies allow you to choose what Agent is used, to minimize how many files are written to, and read from, block storage.
range_spread
range_spread
is the same as equal_spread
except that if you have more partitions than Agents, it will assign a contiguous range of partitions to the same Agent. For each partition i
, it assigns the partition to a broker based on i / len(agents)
.
ws_host_override
ws_host_override
instructs the Kafka client to connect to the specified hostname instead of the Agent's advertised address. This is particularly useful when agents are:
Running behind load balancers or
Deployed within containers where the advertised address is not routable from the outside
For example when port-forwarding for local development.
For example, if your Warpstream agent is accessible via a load balancer with the DNS name agent-lb.yourcompany.com
, you would set ws_host_override
in your Kafka client configuration to this value.
Accepted aliases:
warpstream_partition_assignment_strategy
warpstream_hostname_override
ws_host_override
ws_ho
ws_ioskh
Configuring Multiple Features
It is possible to combine these, for example
will configure an Apache Kafka client that
indicates it is in the
us-east-1a
zone.will write different partitions to different agents to spread its load to multiple Agents
will connect only to the
proxy-produce
agentsalso has a custom string of your choosing, unused by WarpStream, in its id.
Last updated
Was this helpful?