Network Architecture Considerations

This page describes a variety of different approaches that can be used to deploy WarpStream with more advanced network setups.

Normal Network Architecture Setup

In most WarpStream deployments, client applications must connect directly to the WarpStream agents. This requires direct layer 3 network connectivity between the client applications and agents with no proxies, load balancers, NATing, etc. in the middle.

The below architecture is a normal network architecture where all the applications can directly communicate with all the WarpStream agents.

This is the recommended architecture for most WarpStream deployments because it is the easiest, most-effective, and most performant way to run a WarpStream cluster.

Approaches for Connectivity Between Unconnected Networks

In some situations, the direct connectivity described in the previous section is not always possible or desired. One example situation would be when the WarpStream agents are deployed in a Kubernetes cluster, but the client applications are deployed outside of the Kubernetes cluster in a completely different VPC.

There are two different approaches for enabling connectivity between Kafka Clients and the WarpStream Agents when the clients and agents are running in different networks with no direct connectivity between them:

In general, Agent Groups are the preferred solution. They're easier to set up, (generally) more cost-effective, and they don't suffer from any of the performance penalties that are associated with using a TCP load balancer.

Agent Groups are the recommended approach for solving lack of direct connectivity between Kafka clients and WarpStream agents. The only scenario where we don't recommend this approach is if it will require a very high number of Agent groups.

WarpStream's diskless architecture means that any Agent can write or read data for any topic-partition. As a result, WarpStream clusters can be split into distinct "groups" that are completely isolated from each other at the networking / service discovery layer.

This feature is called Agent Groups and is very useful for enabling a single WarpStream cluster to be flexed across multiple disparate networks with no inter-connectivity without incurring the cost and performance penalties of using a TCP load balancer.

For more details, read the Agent Groups documentation.

TCP Load Balancer

If the Agent Group approach is not viable for some reason, you'll have to setup a TCP load balancer instead.

When exposing WarpStream to external networks (I.E over the internet) it is highly recommended to configure TLS and Authentication. See Protect Data in Motion with TLS Encryption, SASL Authentication, Mutual TLS (mTLS) for configuration details.

Agent configuration:

  • WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID=$VIRTUAL_CLUSTER_ID

  • WARPSTREAM_REQUIRE_SASL_AUTHENTICATION=true

  • WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE=$LOAD_BALANCER_HOSTNAME

In some cases, the load balancer may be listening on a port that's different from the port the Agents are listening on (defaut 9092 for TCP/Kafka protocol traffic). In that scenario, you'll need to add one additional environment variable to the Agent configuration:

WARPSTREAM_DISCOVERY_KAFKA_PORT_OVERRIDE=$EXTERNAL_NLB_PORT

This instructs the Agents to advertise the load balancer's port within the Kafka protocol instead of the port that the Agents are listening on.

Note that this change will make the Agents advertise a different port within the Kafka protocol, but they'll continue listening on the same port (default 9092) so traffic between the load balancer and the Agents will not be impacted by this change. It's just required due to a quirk of how service discovery within the Kafka protocol works.

Kubernetes

Running WarpStream within Kubernetes can be simple and straightforward with our Helm charts.

However, when applications that are running outside of the Kubernetes cluster / VPC need to connect to WarpStream additional configuration may be required.

In this example setup we will have at least 3 helm deployments for 3 different agent groups. See Agent Groups for information about groups.

Agent Group One will handle applications running in the same Kubernetes cluster as the agents via direct connectivity within Kubernetes.

Agent Group Two will handle applications running in the same VPC as the Kubernetes cluster but not running in the Kubernetes cluster itself.

In some setups this group isn't needed due to pod IPs being routable on the VPC, consult your cloud provider's Kubernetes documentation for details about routable pod IPs.

Agent Group Three will handle applications running outside of the VPC, for example connecting over the internet.

In all three cases the bootstrap server will be printed out in the NOTES section during the helm install.

Bellow are the recommended helm values to set for the various groups.

one-values.yaml
config:
    agentGroup: one
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
two-values.yaml
config:
    agentAroup: two
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
kafkaService:
    enabled: true
    annotations:
        # Uncomment one of the following annotations depending on your Cloud Provider
        # networking.gke.io/load-balancer-type: "Internal"
        # service.beta.kubernetes.io/azure-load-balancer-internal: "true"
        # service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
    type: LoadBalancer
    port: 9092
# Override the hostname to be the hostname of the internal TCP Load Balancer
# In some environments this isn't needed if your Kubernetes pod IPs are routable.
# See your Kubernetes provider network documentation for details.
extraEnv:
    - name: WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE
      # Replace this with the hostname of your internal TCP load balancer
      value: nlb-internal.xxx
three-values.yaml
config:
    agentGroup: three
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
kafkaService:
    enabled: true
    annotations:
        # If using AWS EKS uncomment the following annotation
        # service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    type: LoadBalancer
    port: 9092
# Set a certificate since this load balancer is exposed to the internet
certificate:
    enableTLS: true
    # The Kubernetes TLS secret that contains a certificate and private key
    # see https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets
    secretName: warpstream-external-tls
    
    # If using mtls uncomment the following
    # mtls:
    #     enabled: true
    #
    #     # The secret key reference for the certificate authority public key
    #     certificateAuthoritySecretKeyRef:
    #       name: "warpstream-external-tls"
    #       key: "ca.crt"
# Override the hostname to be the hostname of the external TCP Load Balancer
extraEnv:
    - name: WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE
      # Replace this with the hostname of your external TCP load balancer
      value: nlb-external.xxx
    # If using SASL authentication uncomment the following
    # - name: WARPSTREAM_REQUIRE_SASL_AUTHENTICATION
    #   value: "true"
    #
    # If using mTLS authentication uncomment the following
    # - name: WARPSTREAM_REQUIRE_MTLS_AUTHENTICATION
    #   value: "true"

You can then install all three agent groups by running the following commands:

helm upgrade --install warpstream-agent-one warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values one-values.yaml

helm upgrade --install warpstream-agent-two warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values two-values.yaml

helm upgrade --install warpstream-agent-three warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values three-values.yaml

When using AWS EKS it is recommended to use the AWS Load Balancer Controller, the old in-tree or out-of-tree cloud provider for EKS is considered Legacy by AWS. While it is still possible to use the legacy provider for Load Balancers there is little available public documentation and the required annotations may be different.

Additional Configuration

Client Specific Override

It is sometimes useful to override the hostname on a client level. This is typically needed when using kubectl port-forward.

Set the ws_host_override parameter within the client's ID when creating the Kafka client (check Configuring Kafka Client ID Features for more details):

kgo.NewClient(..., 
    kgo.ClientID("ws_host_override=127.0.0.1"),
)

Our recommendation is to only use the above configuration in debugging situations and not long-term deployments.

Internal Listener Override

WarpStream agents must be able to directly communicate with each other. They need to communicate to efficiently share files and data.

In rare situations it may be necessary to override the internal agent to agent hostname.

This can be done by setting the -advertiseHostnameStrategy flag or the WARPSTREAM_ADVERTISE_HOSTNAME_STRATEGY environment variable to custom. Then, provide the custom hostname by setting either the -advertiseHostnameCustom flag or the WARPSTREAM_ADVERTISE_HOSTNAME_CUSTOM environment variable.

However, our recommendation is to always allow agents to directly communicate with each other and not adjust the above mentioned configurations.

FAQ

Why TCP Load Balancers Can Cause Performance Problems

There are two reasons that introducing a load balancer between Kafka clients and the WarpStream Agent can result in performance problems:

  1. WarpStream has a built-in load balancing mechanism that keeps the WarpStream Agents evenly utilized.

  2. While any Agent can handle writes or reads for any partition, WarpStream will generally try to align writes/reads for the same topic-partition from different clients on the same Agent. This improves data locality which dramatically improves performance in a variety of different dimensions (latency, utilization, compression, etc).

Both of these mechanisms rely on WarpStream controlling (via the Kafka protocol) which clients connect to which Agents. As a result, these mechanisms degrade when a load balancer is inserted between the Kafka clients and the WarpStream Agents.

For well behaved workloads, WarpStream can still work well when running behind a load balancer, but direct connectivity between Kafka clients and the WarpStream Agents is always recommended for the most demanding workloads.

Whenever possible, try to solve connectivity problems with Agent Groups instead.

Typical issues when hostname is not overridden correctly

WarpStream agents utilize their private IP and ports for ongoing connections after the initial bootstrap. Without the correct configurations, clients might connect to bootstrap successfully yet experience issues when progressing beyond the initial phase.

For example you may receive the following errors when hostname override is incorrectly set:

% warpstream cli -bootstrap-host my-kafka.example.com -type diagnose-connection
running diagnose-connection sub-command with bootstrap-host: my-kafka.exampl.com and bootstrap-port: 9092


Broker Details
---------------
  10.212.2.26:9092 (NodeID: 1195648645)
failed to communicate with Agent returned as part of Kafka Metadata response, err: <nil>, this usually means that the provided bootstrap host: my-kafka.exampl.com:9092 is accessible on the current network, but the URL that the Agent is advertising as its broker host/ip: 10.212.2.26:9092 is not accessible on this network. If this is occurring during local development whilst running the Agent in a docker container, consider adding the following flag to the docker run command: --env "WARPSTREAM_PRIVATE_IP_OVERRIDE=127.0.0.1" which will force the Agent to advertise its hostname/IP address as localhost for development purposes.
% kafka-topics --bootstrap-server my-kafka.example.com --list
[2025-02-03 15:08:19,631] WARN [AdminClient clientId=adminclient-1] Connection to node 1195648645 (10.212.2.26:9092) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient)

In these examples we are trying to connect to my-kafka.example.com. However, the WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE environment variable is not set on the agent to that hostname. We can see that the Kafka clients are trying to connect to 10.212.2.26:9092which is the private IP of the agent. Our Kafka clients cannot connect to the IP so they fail with connection errors.

Last updated

Was this helpful?