Kubernetes Known Issues

When running in EKS Availability Zone is Unset or Wrong

Symptom

In the WarpStream UI for the cluster you see warpstream-unset-az set as the availability zone of the agent and/or errors in the agent logs similar to the following:

{"time":"2025-04-02T22:23:46.467567362Z","level":"ERROR","msg":"failed to determine availability zone","git_commit":"32d51900b2423718b692a0edd29b08b11b7dd74e","git_time":"2025-04-02T18:53:04Z","git_modified":false,"go_os":"linux","go_arch":"arm64","process_generation":"081c0596-25c3-4147-88d5-d4416cb6a998","hostname_fqdn":"warp-agent-default-67d9795854-wrwh8","hostname_short":"warp-agent-default-67d9795854-wrwh8","private_ips":["10.0.115.97"],"num_vcpus":3,"kafka_enabled":true,"virtual_cluster_id":"vci_bc62be92_d3ba_4b0c_90e8_4e7bc621a693","module":"agent_azloader","error":{"message":"awsECSErr: missing metadata uri in environment (ECS_CONTAINER_METADATA_URI_V4), likely not running in ECS\nawsEC2Err: error getting metadata: operation error ec2imds: GetMetadata, canceled, context deadline exceeded\ngcpErr: error getting availablity zone: \nazureErr: error getting location: \nk8sErr: unable to get node information: nodes \"i-025487767185742f1\" is forbidden: User \"system:serviceaccount:warpstream:warpstream0-agent\" cannot get resource \"nodes\" in API group \"\" at the cluster scope"}}

Context

The WarpStream Agents try to use various methods to determine which availability zone the agent is running in.

When it can't determine the availability zone it falls back to warpstream-unset-az and logs error messages.

Problem

AWS by default prevents EKS pods from contacting the metadata service to prevent instance metadata leaks. While this is good security practice for normal instances, it prevents services within EKS from querying information about the instance.

Solution

Option A

Use our Helm Chart to deploy WarpStream. The helm chart with it's default configuration will create a Kubernetes ClusterRole and ClusterRoleBinding which allows the WarpStream pods to lookup get node they are running on within the Kubernetes API and find the availability zone from node labels.

Option B

Create the appropriate ClusterRole, ClusterRoleBinding, and ServiceAccount so the WarpStream agent can get availability zone information from the Kubernetes API

apiVersion: v1
kind: ServiceAccount
metadata:
  name: warpstream-agent
  namespace: ${your-namespace}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: warpstream-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - watch
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: warpstream-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: warpstream-agent
subjects:
- kind: ServiceAccount
  name: warpstream-agent
  namespace: ${your-namespace}

Then on your WarpStream deployment set the pod service account to warpstream-agent.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: warpstream-agent
  namespace: ${your-namespace}
spec:
  selector:
    matchLabels:
      app.kubernetes.io/app: warpstream-agent
  template:
    metadata:
      labels:
        app.kubernetes.io/app: warpstream-agent
    spec:
      containers:
      - args:
        - agent
        ...
        image: public.ecr.aws/warpstream-labs/warpstream_agent:latest
      ...
      serviceAccount: warpstream-agent

Option C

Modify your EKS Node Launch Template configuration to set http-put-response-hop-limit to 2.

This will allow the pods running on a EKS instance to connect to the AWS metadata service to find the availability zone.

When running in Kubernetes WarpStream pods end up in the same zone or node

Symptom

Some or all of your WarpStream pods end up running in the same availability zone or on the same Kubernetes node instead of being evenly spread out.

Context

When running workloads in Kubernetes it will try it's best to make sure pods from the same deployment are evenly spread across all nodes and availability zones, however this isn't always possible.

Problem

Depending on Kubernetes cluster configuration and other workloads on the cluster Kubernetes may not evenly deploy WarpStream pods across zones or nodes. Some Kubernetes deployments prioritize bin-packing rather then high availability of workloads. This varies by Kubernetes distribution and is not always configurable.

Solution

Use Kubernetes topologySpreadConstraints and podAntiAffinity to force Kubernetes to spread WarpStream pods evenly across zones and nodes. If your WarpStream pods are using our Helm Chart you can set the following in your helm values:

topologySpreadConstraints:
  # Try to spread pods across multiple zones
  - maxSkew: 1 # +/- one pod per zone
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    # minDomains is only available in Kubernetes 1.30+
    # Remove this field if you are on an older Kubernetes
    # version.
    # When possible set to the number of available 
    # availability zones in your cluster.
    minDomains: 3
    # Label Selector to select the warpstream deployment
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: warpstream-agent
        app.kubernetes.io/instance: warpstream-agent # Set to your helm release name

affinity:
  # Make sure pods are not scheduled on the same node to prevent bin packing
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    # Label Selector to select the warpstream deployment
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: warpstream-agent
          app.kubernetes.io/instance: warpstream-agent # Set to your helm release name
      topologyKey: kubernetes.io/hostname

PreviousObject Storage Configuration NextRolling Restarts and Upgrades

Last updated 3 months ago

Was this helpful?