LogoLogo
WarpStream.comSlackDiscordContact UsCreate Account
  • Overview
    • Introduction
    • Architecture
      • Service Discovery
      • Write Path
      • Read Path
      • Life of a Request (Simplified)
    • Change Log
  • Getting Started
    • Install the WarpStream Agent / CLI
    • Run the Demo
    • "Hello World" for Apache Kafka
  • BYOC
    • Run the Agents Locally
    • Deploy the Agents
      • Object Storage Configuration
      • Kubernetes Known Issues
      • Rolling Restarts and Upgrades
    • Infrastructure as Code
      • Terraform Provider
      • Helm charts
      • Terraform Modules
    • Monitoring
      • Pre-made Datadog Dashboard
      • Pre-made Grafana Dashboard
      • Important Metrics and Logs
      • Recommended List of Alerts
      • Monitoring Consumer Groups
      • Hosted Prometheus Endpoint
    • Client Configuration
      • Tuning for Performance
      • Configure Clients to Eliminate AZ Networking Costs
        • Force Interzone Load Balancing
      • Configuring Kafka Client ID Features
      • Known Issues
    • Authentication
      • SASL Authentication
      • Mutual TLS (mTLS)
      • Basic Authentication
    • Advanced Agent Deployment Options
      • Agent Roles
      • Agent Groups
      • Protect Data in Motion with TLS Encryption
      • Low Latency Clusters
      • Network Architecture Considerations
      • Agent Configuration Reference
      • Reducing Infrastructure Costs
      • Client Configuration Auto-tuning
    • Hosted Metadata Endpoint
    • Managed Data Pipelines
      • Cookbooks
    • Schema Registry
      • WarpStream BYOC Schema Registry
      • Schema Validation
      • WarpStream Schema Linking
    • Port Forwarding (K8s)
    • Orbit
    • Enable SAML Single Sign-on (SSO)
    • Trusted Domains
    • Diagnostics
      • GoMaxProcs
      • Small Files
  • Reference
    • ACLs
    • Billing
      • Direct billing
      • AWS Marketplace
    • Benchmarking
    • Compression
    • Protocol and Feature Support
      • Kafka vs WarpStream Configuration Reference
      • Compacted topics
    • Secrets Overview
    • Security and Privacy Considerations
    • API Reference
      • API Keys
        • Create
        • Delete
        • List
      • Virtual Clusters
        • Create
        • Delete
        • Describe
        • List
        • DescribeConfiguration
        • UpdateConfiguration
      • Virtual Clusters Credentials
        • Create
        • Delete
        • List
      • Monitoring
        • Describe All Consumer Groups
      • Pipelines
        • List Pipelines
        • Create Pipeline
        • Delete Pipeline
        • Describe Pipeline
        • Create Pipeline Configuration
        • Change Pipeline State
      • Invoices
        • Get Pending Invoice
        • Get Past Invoice
    • CLI Reference
      • warpstream agent
      • warpstream demo
      • warpstream cli
      • warpstream cli-beta
        • benchmark-consumer
        • benchmark-producer
        • console-consumer
        • console-producer
        • consumer-group-lag
        • diagnose-record
        • file-reader
        • file-scrubber
      • warpstream playground
    • Integrations
      • Arroyo
      • AWS Lambda Triggers
      • ClickHouse
      • Debezium
      • Decodable
      • DeltaStream
      • docker-compose
      • DuckDB
      • ElastiFlow
      • Estuary
      • Fly.io
      • Imply
      • InfluxDB
      • Kestra
      • Materialize
      • MinIO
      • MirrorMaker
      • MotherDuck
      • Ockam
      • OpenTelemetry Collector
      • ParadeDB
      • Parquet
      • Quix Streams
      • Railway
      • Redpanda Console
      • RisingWave
      • Rockset
      • ShadowTraffic
      • SQLite
      • Streambased
      • Streamlit
      • Timeplus
      • Tinybird
      • Upsolver
    • Partitions Auto-Scaler (beta)
    • Serverless Clusters
Powered by GitBook
On this page
  • Normal Network Architecture Setup
  • WarpStream behind a TCP Load Balancer Without Direct Connectivity
  • WarpStream inside Kubernetes with Applications outside Kubernetes
  • Client Specific Override
  • Internal Listener Override
  • Typical issues when hostname is not overridden correctly

Was this helpful?

  1. BYOC
  2. Advanced Agent Deployment Options

Network Architecture Considerations

PreviousLow Latency ClustersNextAgent Configuration Reference

Last updated 3 months ago

Was this helpful?

In most WarpStream deployments client applications must connect directly to the WarpStream agents. This requires direct layer 3 network connectivity between the client applications and agents with no proxies, load balancers, NATing, etc. in the middle.

In some situations this type of connectivity is not always possible or desired. One example situation would be when the WarpStream agents are deployed in a Kubernetes cluster, but the client applications are outside of the Kubernetes cluster.

This guide clarifies key concepts and steps to guarantee a seamless network and connection setup.

To explore in detail the functionality of WarpStream's service discovery mechanism, check: Service Discovery.

Normal Network Architecture Setup

The below architecture is a normal network architecture where all the applications can directly communicate with all the WarpStream agents. This is the recommended architecture for most WarpStream deployments.

WarpStream behind a TCP Load Balancer Without Direct Connectivity

This type of network architecture is recommended when applications are connecting to WarpStream agents outside of the agent's local network, for example connecting over the internet.

In this situation applications cannot directly connect to the WarpStream agents and must connect through a TCP Load Balancer.

When exposing WarpStream to external networks it is highly recommended to configure TLS and Authentication. See Protect Data in Motion with TLS Encryption, SASL Authentication, Mutual TLS (mTLS) for configuration details.

Agent configuration:

  • WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID=$VIRTUAL_CLUSTER_ID

  • WARPSTREAM_REQUIRE_SASL_AUTHENTICATION=true

  • WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE=$LOAD_BALANCER_HOSTNAME

WarpStream inside Kubernetes with Applications outside Kubernetes

Running WarpStream within Kubernetes can be simple and straightforward with our Helm charts.

However, when applications that are running outside of the Kubernetes cluster need to connect to WarpStream additional configuration is required.

In this example setup we will have at least 3 helm deployments for 3 different agent groups. See Agent Groups for information about groups.

Agent Group One will handle applications running in the same Kubernetes cluster as the agents.

Agent Group Two will handle applications running in the same VPC as the Kubernetes cluster but not running in the Kubernetes cluster itself.

Agent Group Three will handle applications running outside of the VPC, for example connecting over the internet.

In all three cases the bootstrap server will be printed out in the NOTES section during the helm install.

Bellow are the recommended helm values to set for the various groups.

one-values.yaml
config:
    agentGroup: one
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
two-values.yaml
config:
    agentAroup: two
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
kafkaService:
    enabled: true
    annotations:
        # Uncomment one of the following annotations depending on your Cloud Provider
        # networking.gke.io/load-balancer-type: "Internal"
        # service.beta.kubernetes.io/azure-load-balancer-internal: "true"
        # service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
    type: LoadBalancer
    port: 9092
# Override the hostname to be the hostname of the internal TCP Load Balancer
# In some environments this isn't needed if your Kubernetes pod IPs are routable.
# See your Kubernetes provider network documentation for details.
extraEnv:
    - name: WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE
      # Replace this with the hostname of your internal TCP load balancer
      value: nlb-internal.xxx
three-values.yaml
config:
    agentGroup: three
    bucketURL: <WARPSTREAM_BUCKET_URL>
    apiKey: <WARPSTREAM_AGENT_APIKEY>
    virtualClusterID: <WARPSTREAM_VIRTUAL_CLUSTER_ID>
    region: <WARPSTREAM_CLUSTER_REGION>
kafkaService:
    enabled: true
    annotations:
        # If using AWS EKS uncomment the following annotation
        # service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    type: LoadBalancer
    port: 9092
# Set a certificate since this load balancer is exposed to the internet
certificate:
    enableTLS: true
    # The Kubernetes TLS secret that contains a certificate and private key
    # see https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets
    secretName: warpstream-external-tls
    
    # If using mtls uncomment the following
    # mtls:
    #     enabled: true
    #
    #     # The secret key reference for the certificate authority public key
    #     certificateAuthoritySecretKeyRef:
    #       name: "warpstream-external-tls"
    #       key: "ca.crt"
# Override the hostname to be the hostname of the external TCP Load Balancer
extraEnv:
    - name: WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE
      # Replace this with the hostname of your external TCP load balancer
      value: nlb-external.xxx
    # If using SASL authentication uncomment the following
    # - name: WARPSTREAM_REQUIRE_SASL_AUTHENTICATION
    #   value: "true"
    #
    # If using mTLS authentication uncomment the following
    # - name: WARPSTREAM_REQUIRE_MTLS_AUTHENTICATION
    #   value: "true"

You can then install all three agent groups by running the following commands:

helm upgrade --install warpstream-agent-one warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values one-values.yaml

helm upgrade --install warpstream-agent-two warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values two-values.yaml

helm upgrade --install warpstream-agent-three warpstream/warpstream-agent \
    --namespace $YOUR_NAMESPACE \
    --values three-values.yaml

Client Specific Override

It is sometimes useful to override the hostname on a client level. This is typically needed when using kubectl port-forward.

Set the ws_host_override parameter within the client's ID when creating the Kafka client (check Configuring Kafka Client ID Features for more details):

kgo.NewClient(..., 
    kgo.ClientID("ws_host_override=127.0.0.1"),
)

Our recommendation is to only use the above configuration in debugging situations and not long-term deployments.

Internal Listener Override

In rare situations it may be necessary to override the internal agent to agent hostname.

This can be done by setting the -advertiseHostnameStrategy flag or the WARPSTREAM_ADVERTISE_HOSTNAME_STRATEGY environment variable to custom. Then, provide the custom hostname by setting either the -advertiseHostnameCustom flag or the WARPSTREAM_ADVERTISE_HOSTNAME_CUSTOM environment variable.

However, our recommendation is to always allow agents to directly communicate with each other and not adjust the above mentioned configurations.

Typical issues when hostname is not overridden correctly

WarpStream agents utilize their private IP and ports for ongoing connections after the initial bootstrap. Without the correct configurations, clients might connect to bootstrap successfully yet experience issues when progressing beyond the initial phase.

For example you may receive the following errors when hostname override is incorrectly set:

% warpstream cli -bootstrap-host my-kafka.example.com -type diagnose-connection
running diagnose-connection sub-command with bootstrap-host: my-kafka.exampl.com and bootstrap-port: 9092


Broker Details
---------------
  10.212.2.26:9092 (NodeID: 1195648645)
failed to communicate with Agent returned as part of Kafka Metadata response, err: <nil>, this usually means that the provided bootstrap host: my-kafka.exampl.com:9092 is accessible on the current network, but the URL that the Agent is advertising as its broker host/ip: 10.212.2.26:9092 is not accessible on this network. If this is occurring during local development whilst running the Agent in a docker container, consider adding the following flag to the docker run command: --env "WARPSTREAM_PRIVATE_IP_OVERRIDE=127.0.0.1" which will force the Agent to advertise its hostname/IP address as localhost for development purposes.
% kafka-topics --bootstrap-server my-kafka.example.com --list
[2025-02-03 15:08:19,631] WARN [AdminClient clientId=adminclient-1] Connection to node 1195648645 (10.212.2.26:9092) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient)

In these examples we are trying to connect to my-kafka.example.com. However, the WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE environment variable is not set on the agent to that hostname. We can see that the Kafka clients are trying to connect to 10.212.2.26:9092which is the private IP of the agent. Our Kafka clients cannot connect to the IP so they fail with connection errors.

When using AWS EKS it is recommended to use the , the old in-tree or out-of-tree cloud provider for EKS is considered by AWS. While it is still possible to use the legacy provider for Load Balancers there is little available public documentation and the required annotations may be different.

WarpStream agents must be able to directly communicate with each other. .

AWS Load Balancer Controller
Legacy
They need to communicate to efficiently share files and data