Force Interzone Load Balancing

This page described how to force inter-zone load balancing for clients that don't regularly refresh their Metadata.

This is advanced documentation for users who care strongly about inter-zone networking fees and are required to use the segmentio Kafka library and can't migrate to a better client like franz-go for some reason. If that scenario does not apply to you, skip this page.

Some Kafka clients don't regularly refresh metadata, causing them not to discover agents within the same Availability Zones (AZs). This can lead to inter-zone bandwidth usage. Fortunately, Warpstream has introduced a straightforward solution to address this problem using the ClientID features.

Problematic Libraries

The segmentio library, to our knowledge, does not refresh metadata automatically. This behavior is primarily observed in their consumers. However, there have been instances where the producers too stop querying metadata for extended periods. If you're utilizing the segmentio library, it's recommended to activate this feature to minimize interzone network bandwidth consumption.

How to Enable Interzone Load Balancing

To activate the warpstream interzone load balancing in such scenarios, append the following flags to the clientID: warpstream_az=<your-az>,warpstream_interzone_lb=true.

  • warpstream_az=<your-az>: This flag indicates the AZ in which the client is operating.

  • warpstream_interzone_lb=true: This flag activates the load balancing mechanism in the agent specifically for this client.

availabilityZone := lookupAZ()
sessionID := uuid.New().String()

clientID := fmt.Sprintf(
	"warpstream_session_id=%s,warpstream_az=%s,warpstream_interzone_lb=true,
	sessionID, availabilityZone)

How the Load Balancing Mechanism Works

When a client includes the aforementioned flag in the client ID:

  1. The agent periodically assesses if the connection between itself and the client exists within the same AZ. If it does, no action is taken.

  2. If they are in different AZs, the agent checks if there are other agents within the same AZ as the client.

  3. If such agents are found, the agent closes the connection, forcing the client to restart the service discovery process from the beginning, ensuring it identifies agents within its own Availability Zone (AZ).

Tuning the Load Balancing Check Interval

You can adjust the frequency at which the agent verifies this (applicable only to clients who activate the flag in the ClientID) using:

  • Flag: -kafkaInterzoneLoadBalancingInterval

  • Environment Variable: WARPSTREAM_KAFKA_INTERZONE_LOAD_BALANCING_INTERVAL

Error Handling in Interzone Load Balancing

Interzone load balancing is activated under specific conditions:

  • When an agent is deployed to a new Availability Zone (AZ) for the first time.

  • When the only agent in an AZ is removed.

While these occurrences are rare, clients might encounter the following errors:

Specific Errors to Look For:

  • io.ErrUnexpectedEOF

  • io.EOF

  • net.ErrClosed

In case you encounter any of these errors, you should simply retry if it's a Produce call, and log and continue if it's a Consume call.

Last updated