Configuring Kafka Client ID Features
WarpStream uses the client id setting that you set on Apache Kafka clients to control how certain features are activated
Note that configuring the client id is optional, and that if you set the client id to a string that WarpStream does not recognize, no optional feature will be enabled.
WarpStream parses the Apache Kafka client id for key value pairs that are separated with commas. Each key value pair is defined as key=value
This means that WarpStream parses the following Apache Kafka client id:
a=b,c=d,rest_of_the_client_id
into two key/value pairs: key: a, value: b
and key:c, value: d
These keys are recognized by WarpStream:
warpstream_az
indicates to WarpStream that this client runs on the zone provided in the valuewarpstream_partition_assignment_strategy
indicates toWarpStream how you would like to spread the partitions you read from or write to on different agents
warpstream_proxy_target
indicates what agent roles you want to target when reading data from or writing data to WarpStream
warpstream_az
will create a new Apache Kafka client that will indicate it's in the us-east-1a
zone. It will attempt to talk to Agents in the same zone to minimize interzone bandwidth costs if there are some available. However WarpStream favors availability so if it cannot find Agents in that zone, it will direct the clients to Agents that are in other zones.
ws_host_override
will instruct the Kafka client to connect to the specified hostname instead of the agent's advertised address. This is particularly useful when agents are: 1) running behind load balancers or 2) deployed within containers where the advertised address is not routable from the outside
For example, if your Warpstream agent is accessible via a load balancer with the DNS name agent-lb.yourcompany.com
, you would set ws_host_override
in your Kafka client configuration to this value.
warpstream_partition_assignment_strategy
Since Agents are stateless, it is possible to direct your reads or your writes to any Agent in the pool. But different strategies allow you to choose what Agent is used, to minimize how many files are written to, and read from, block storage.
"warpstream_partition_assignment_strategy=single_agent"
means your Apache Kafka client will talk to the same Agent to read from or write to a single topic. It is the default value."warpstream_partition_assignment_strategy=equal_spread"
means your Apache Kafka client will talk to all Agents in the pool to write to all partition of a topic, spreading the writes to each partition on a different client. In other words, distributes partitions in a round-robin fashion, ensuring an even spread across all brokers. For each partitioni
, it assigns the partition to a broker based on the result ofi % len(agents)
."warpstream_partition_assignment_strategy=range_spread"
is the same as"equal_spread"
except that if you have more partitions than Agents, it will assign a contiguous range of partitions to the same Agent. For each partitioni
, it assigns the partition to a broker based oni / len(agents)
.
warpstream_proxy_target
If you have split your pool of Agents using Agent roles (as described here), you should configure your Apache Kafka Client by setting warpstream_proxy_target
to either proxy-produce
(this client will connect to the nodes with the proxy-produce
role) or to proxy-consume
(and this Agent will connect to the nodes with the proxy-consume
role).
setting multiple options
It is possible to combine these, for example
will configure an Apache Kafka client that
indicates it is in the
us-east-1a
zone.will write different partitions to different agents to spread its load to multiple Agents
will connect only to the
proxy-produce
agentsalso has a custom string of your choosing, unused by WarpStream, in its id.
Last updated