LogoLogo
WarpStream.comSlackDiscordContact UsCreate Account
  • Overview
    • Introduction
    • Architecture
      • Service Discovery
      • Write Path
      • Read Path
      • Life of a Request (Simplified)
    • Change Log
  • Getting Started
    • Install the WarpStream Agent / CLI
    • Run the Demo
    • "Hello World" for Apache Kafka
  • BYOC
    • Run the Agents Locally
    • Deploy the Agents
      • Object Storage Configuration
      • Kubernetes Known Issues
      • Rolling Restarts and Upgrades
    • Infrastructure as Code
      • Terraform Provider
      • Helm charts
      • Terraform Modules
    • Monitoring
      • Pre-made Datadog Dashboard
      • Pre-made Grafana Dashboard
      • Important Metrics and Logs
      • Recommended List of Alerts
      • Monitoring Consumer Groups
      • Hosted Prometheus Endpoint
    • Client Configuration
      • Tuning for Performance
      • Configure Clients to Eliminate AZ Networking Costs
        • Force Interzone Load Balancing
      • Configuring Kafka Client ID Features
      • Known Issues
    • Authentication
      • SASL Authentication
      • Mutual TLS (mTLS)
      • Basic Authentication
    • Advanced Agent Deployment Options
      • Agent Roles
      • Agent Groups
      • Protect Data in Motion with TLS Encryption
      • Low Latency Clusters
      • Network Architecture Considerations
      • Agent Configuration Reference
      • Reducing Infrastructure Costs
      • Client Configuration Auto-tuning
    • Hosted Metadata Endpoint
    • Managed Data Pipelines
      • Cookbooks
    • Schema Registry
      • WarpStream BYOC Schema Registry
      • Schema Validation
      • WarpStream Schema Linking
    • Port Forwarding (K8s)
    • Orbit
    • Enable SAML Single Sign-on (SSO)
    • Trusted Domains
    • Diagnostics
      • GoMaxProcs
      • Small Files
  • Reference
    • ACLs
    • Billing
      • Direct billing
      • AWS Marketplace
    • Benchmarking
    • Compression
    • Protocol and Feature Support
      • Kafka vs WarpStream Configuration Reference
      • Compacted topics
    • Secrets Overview
    • Security and Privacy Considerations
    • API Reference
      • API Keys
        • Create
        • Delete
        • List
      • Virtual Clusters
        • Create
        • Delete
        • Describe
        • List
        • DescribeConfiguration
        • UpdateConfiguration
      • Virtual Clusters Credentials
        • Create
        • Delete
        • List
      • Monitoring
        • Describe All Consumer Groups
      • Pipelines
        • List Pipelines
        • Create Pipeline
        • Delete Pipeline
        • Describe Pipeline
        • Create Pipeline Configuration
        • Change Pipeline State
      • Invoices
        • Get Pending Invoice
        • Get Past Invoice
    • CLI Reference
      • warpstream agent
      • warpstream demo
      • warpstream cli
      • warpstream cli-beta
        • benchmark-consumer
        • benchmark-producer
        • console-consumer
        • console-producer
        • consumer-group-lag
        • diagnose-record
        • file-reader
        • file-scrubber
      • warpstream playground
    • Integrations
      • Arroyo
      • AWS Lambda Triggers
      • ClickHouse
      • Debezium
      • Decodable
      • DeltaStream
      • docker-compose
      • DuckDB
      • ElastiFlow
      • Estuary
      • Fly.io
      • Imply
      • InfluxDB
      • Kestra
      • Materialize
      • MinIO
      • MirrorMaker
      • MotherDuck
      • Ockam
      • OpenTelemetry Collector
      • ParadeDB
      • Parquet
      • Quix Streams
      • Railway
      • Redpanda Console
      • RisingWave
      • Rockset
      • ShadowTraffic
      • SQLite
      • Streambased
      • Streamlit
      • Timeplus
      • Tinybird
      • Upsolver
    • Partitions Auto-Scaler (beta)
    • Serverless Clusters
Powered by GitBook
On this page
  • UI
  • API
  • Metrics

Was this helpful?

  1. BYOC
  2. Monitoring

Monitoring Consumer Groups

How to monitor your consumer groups.

PreviousRecommended List of AlertsNextHosted Prometheus Endpoint

Last updated 2 months ago

Was this helpful?

Most open source Kafka deployments use external tooling to monitor consumer group lag. Some of this tooling is compatible with WarpStream because it uses the public Kafka API, and others like Burrow are incompatible because they rely on internal implementation details of Kafka like reading the internal consumer group offset topics.

Luckily, WarpStream has support for monitoring consumer groups built in, so no external tooling is required. In addition, WarpStream reports consumer group lag measured in time as well as measured in offsets. See for more details about why this is valuable.

Consumer group metadata and lag is available in a variety of locations with WarpStream.

UI

The WarpStream UI exposes consumer group metadata and lag. This is not useful for alerting purposes, but can be helpful when debugging consumers.

API

Metrics

Some of the metrics, particularly the consumer group metrics, can become very high cardinality if the cluster contains a lot of topics or partitions. To reduce the cardinality of the consumer group lag metrics, you can either disable them entirely using the disableConsumerGroupMetrics flag or setting WARPSTREAM_DISABLE_CONSUMER_GROUP_METRICS=true as an environment variable.

The most important metrics are warpstream_consumer_group_lag (lag in offsets per tuple of <topic, consumer_group>) and warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_seconds which is a mouthful but can be used to configure alerts based on time instead of offset count.

The partition tag is disabled by default to reduce cardinality. If you want to enable it, set the disableConsumerGroupsMetricsTags flag or WARPSTREAM_DISABLE_CONSUMER_GROUP_METRICS_TAGS environment variable to an empty string (the default value is "partition"). When the partitiontag is disabled, the consumer_group_lagmetric will be the sum of the consumer group lag across the topic's partitions. The warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_secondsmetric will be the max of the estimated lag across the topic's partitions.

Name
Description
Tags

warpstream_consumer_group_lag

Difference (in offsets) between the max offset and the committed offset for every active consumer group.

virtual_cluster_id, topic, consumer_group and partition

warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_seconds

Gives a rough estimate of how far behind (in seconds) a consumer group is from the latest messages.

Note: This is NOT for precise measurement; it's a coarse estimate.

virtual_cluster_id, topic, consumer_group and partition

warpstream_consumer_group_generation_id

A unique identifier that increases with every consumer group rebalance. This allows you to easily track the number and frequency of rebalances.

virtual_cluster_id and consumer_group

warpstream_consumer_group_max_offset

Max offset of a given topic-partition for every topic-partition in every consumer group.

virtual_cluster_id, topic, consumer_group and partition

warpstream_consumer_group_state

State of each consumer group (stable, rebalancing, empty, etc)

consumer_group, group_state

warpstream_consumer_group_num_members

Number of members in each consumer group.

consumer_group

warpstream_consumer_group_num_topics

Number of topics in each consumer group.

consumer_group

warpstream_consumer_group_num_partitions

Number of partitions in each consumer group.

consumer_group

Consumer group lag is available via .

The Agents expose that you can scrape within your own environment. Included in these metrics are all the metrics you need to monitor your applications for consumer group lag.

WarpStream also provides a that can be used to scrape consumer group metrics directly without going through the Agents. This can be helpful for workloads with a high number of topics / partitions where the the time series cardinality is already high and multiplying it by the unique Agent pod names would make it even higher.

our HTTP/JSON API
built-in metrics
hosted prometheus endpoint
our blog post about measuring consumer lag in time
Click on an individual consumer group to see more details.