LogoLogo
WarpStream.comSlackDiscordContact UsCreate Account
  • Overview
    • Introduction
    • Architecture
      • Service Discovery
      • Write Path
      • Read Path
      • Life of a Request (Simplified)
    • Change Log
  • Getting Started
    • Install the WarpStream Agent / CLI
    • Run the Demo
    • "Hello World" for Apache Kafka
  • BYOC
    • Run the Agents Locally
    • Deploy the Agents
      • Object Storage Configuration
      • Kubernetes Known Issues
      • Rolling Restarts and Upgrades
    • Infrastructure as Code
      • Terraform Provider
      • Helm charts
      • Terraform Modules
    • Monitoring
      • Pre-made Datadog Dashboard
      • Pre-made Grafana Dashboard
      • Important Metrics and Logs
      • Recommended List of Alerts
      • Monitoring Consumer Groups
      • Hosted Prometheus Endpoint
    • Client Configuration
      • Tuning for Performance
      • Configure Clients to Eliminate AZ Networking Costs
        • Force Interzone Load Balancing
      • Configuring Kafka Client ID Features
      • Known Issues
    • Authentication
      • SASL Authentication
      • Mutual TLS (mTLS)
      • Basic Authentication
    • Advanced Agent Deployment Options
      • Agent Roles
      • Agent Groups
      • Protect Data in Motion with TLS Encryption
      • Low Latency Clusters
      • Network Architecture Considerations
      • Agent Configuration Reference
      • Reducing Infrastructure Costs
      • Client Configuration Auto-tuning
    • Hosted Metadata Endpoint
    • Managed Data Pipelines
      • Cookbooks
    • Schema Registry
      • WarpStream BYOC Schema Registry
      • Schema Validation
      • WarpStream Schema Linking
    • Port Forwarding (K8s)
    • Orbit
    • Enable SAML Single Sign-on (SSO)
    • Trusted Domains
    • Diagnostics
      • GoMaxProcs
      • Small Files
  • Reference
    • ACLs
    • Billing
      • Direct billing
      • AWS Marketplace
    • Benchmarking
    • Compression
    • Protocol and Feature Support
      • Kafka vs WarpStream Configuration Reference
      • Compacted topics
    • Secrets Overview
    • Security and Privacy Considerations
    • API Reference
      • API Keys
        • Create
        • Delete
        • List
      • Virtual Clusters
        • Create
        • Delete
        • Describe
        • List
        • DescribeConfiguration
        • UpdateConfiguration
      • Virtual Clusters Credentials
        • Create
        • Delete
        • List
      • Monitoring
        • Describe All Consumer Groups
      • Pipelines
        • List Pipelines
        • Create Pipeline
        • Delete Pipeline
        • Describe Pipeline
        • Create Pipeline Configuration
        • Change Pipeline State
      • Invoices
        • Get Pending Invoice
        • Get Past Invoice
    • CLI Reference
      • warpstream agent
      • warpstream demo
      • warpstream cli
      • warpstream cli-beta
        • benchmark-consumer
        • benchmark-producer
        • console-consumer
        • console-producer
        • consumer-group-lag
        • diagnose-record
        • file-reader
        • file-scrubber
      • warpstream playground
    • Integrations
      • Arroyo
      • AWS Lambda Triggers
      • ClickHouse
      • Debezium
      • Decodable
      • DeltaStream
      • docker-compose
      • DuckDB
      • ElastiFlow
      • Estuary
      • Fly.io
      • Imply
      • InfluxDB
      • Kestra
      • Materialize
      • MinIO
      • MirrorMaker
      • MotherDuck
      • Ockam
      • OpenTelemetry Collector
      • ParadeDB
      • Parquet
      • Quix Streams
      • Railway
      • Redpanda Console
      • RisingWave
      • Rockset
      • ShadowTraffic
      • SQLite
      • Streambased
      • Streamlit
      • Timeplus
      • Tinybird
      • Upsolver
    • Partitions Auto-Scaler (beta)
    • Serverless Clusters
Powered by GitBook
On this page
  • Resource Usage Metrics
  • Application Metrics

Was this helpful?

  1. BYOC
  2. Monitoring

Recommended List of Alerts

On this page we list the key metrics you should create alerts on.

PreviousImportant Metrics and LogsNextMonitoring Consumer Groups

Last updated 1 year ago

Was this helpful?

WarpStream was designed to minimize operational burden as much as possible. Therefore, the Agents are completely stateless and only depend on the underlying object store and the WarpStream control plane. The cloud provider manages and monitors the object store, and the WarpStream team manages and monitors the control plane.

For this reason, the WarpStream Agents have very little to alert on. However, if you want to configure additional alerts, you can review our "Important Metrics and Logs" section for a list of key metrics/logs that are good candidates for monitors. In general, instrumentation is more useful for debugging than alerting.

That said, we do recommend configuring a few alerts for resource usage.

Resource Usage Metrics

It's important that the WarpStream Agents have sufficient capacity available to handle your workload and any potential spikes. For that reason, the most important thing to be alerted about with the WarpStream Agents is CPU and memory utilization.

We usually recommend keeping memory/cpu under 50%. Note that since the WarpStream Agents are stateless, it's safe to auto-scale them based on CPU usage.

  • CPU Usage

    • Metric: container.cpu.usage

    • Alert condition: >50

  • Memory Usage

    • Metric: container.memory.usage

    • Alert condition: >50

Application Metrics

In addition to monitoring the resource utilization of the Agents, we also recommend monitoring your workload from your application. For example, tracking errors for producing and fetching data and monitoring your consumer group lag.

For convenience, the WarpStream Agents emit consumer group lag metrics once a minute. The metric can be found at: .

In addition, error rate and latencies for Kafka API operations can be monitored using these metrics emitted by the WarpStream Agents:

  • Error Rate on Kafka API

    • Metric: warpstream_agent_kafka_request_outcome

    • Filter by: outcome:error

    • Group by: kafka_key

  • Latency on Kafka API

    • Metric: warpstream_agent_kafka_request_latency

    • Group by: kafka_key

But generally, it's better to monitor them from your application than from the Agents.

Observability