Hosted Prometheus Endpoint

This page describes how to use WarpStream's hosted prometheus endpoint for collecting control plane metrics.

Overview

Almost all WarpStream metrics are exposed directly in the Agents, including "control plane" metrics that correspond to control plane metadata and not any particular Agent. This is accomplished via a background job that the control plane schedules to "push" control plane metrics to an individual Agent that will then emit it as a regular metric. Exposing control plane metrics this way is convenient, but it can sometimes be problematic due to the resulting cardinality.

For example, consumer group lag metrics are most useful when they're tagged by partition, but emitting consumer group metrics tagged by partition in the Agents makes the time series cardinality very high: O(m * n) where m is the number of topic-partitions and n is the number of the Agents.

As a result, WarpStream offers a hosted Prometheus endpoint that captures WarpStream's control plane metrics. This endpoint is authenticated and can be scraped by your monitoring system to collect some control plane metrics without incurring the additional cardinality of the Agent host / pod names.

Available Metrics

All Clusters

  • warpstream_control_plane_utilization

  • warpstream_diagnostic_failure

  • warpstream_files_count

  • warpstream_topics_count

  • warpstream_topics_count_limit

  • warpstream_partitions_count

  • warpstream_partitions_count_limit

  • warpstream_agent_heartbeat

  • warpstream_agent_cpu

  • warpstream_agent_num_vcpus

Kafka Clusters

  • warpstream_consumer_group_state

  • warpstream_consumer_group_generation_id

  • warpstream_consumer_group_num_members

  • warpstream_consumer_group_num_topics

  • warpstream_consumer_group_num_partitions

  • warpstream_consumer_group_max_offset

  • warpstream_consumer_group_lag

  • warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_seconds

  • warpstream_consumer_group_commit_ts

  • warpstream_produced_records

  • warpstream_max_offset

  • warpstream_min_offset

  • warpstream_num_records

  • warpstream_partition_size_uncompressed_bytes

  • warpstream_partition_size_estimated_compressed_bytes

Tableflow Clusters

  • warpstream_tableflow_ingestion_lag_seconds

  • warpstream_tableflow_partition_offset_lag

  • warpstream_tableflow_query_lag_seconds

  • warpstream_tableflow_tables_count

  • warpstream_tableflow_files_count

  • warpstream_tableflow_snapshots_count

  • warpstream_tableflow_partitions_count

  • warpstream_tableflow_tables_limit

  • warpstream_tableflow_files_limit

  • warpstream_tableflow_snapshots_limit

  • warpstream_tableflow_partitions_limit

Schema Registry Clusters

  • warpstream_schema_versions_count

  • warpstream_schema_versions_limit

Sample Prometheus Scraping Configuration

CURLing Manually

An API Key can be obtained from the "API Keys" tab in the WarpStream console. For more details, see our API Keys reference documentation.

Disabling Metrics Publishing Job in the Agents

If you're scraping our Hosted Prometheus Endpoint, then you can disable the control plane metrics publishing job in the Agent as it will just be emitting duplicate metrics but with much higher cardinality due to the agent / pod level tags.

To disable this job, set the -disablePublishMetricsJob flag or WARPSTREAM_DISABLE_PUBLICS_METRICS_JOB=true environment variable on your Agent deployment.

Dedicated Metrics Agent

Configuring a Prometheus scraping endpoint is not always convenient, especially if you're using our Datadog integration. As a result, we also support running the Agents in a dedicated metrics mode where the Agent binary will do nothing but scrape the Hosted Prometheus Endpoint and publish those metrics itself so you can ingest them alongside all your other Agent metrics.

Agents running in metrics mode cannot process Kafka protocol messages, they will emit control plane metrics and do nothing else.

Running an Agent in metrics mode is as simple:

Optionally, you can enable the Datadog / statsd integration:

Like the regular Agent binary, you can also use environment variables instead:

If you deploy WarpStream on Kubernetes with our Helm chart, all you have to do to enable this feature is set the value of dedicatedMetricsPod.enabled in your values.yaml to true. This will deployed a single dedicated pod that scrapes the WarpStream Hosted Prometheus Endpoint and publishes those metrics itself. Note that the chart will automatically take care of disabling the metrics publishing job in the Agents when you do this.

API vs Agent Keys

You can use an account-level or workspace-level API key to scrape Prometheus metrics for any cluster which that key is authorized for, or you can use an Agent Key to scrape prometheus metrics for a single cluster.

In general, we recommend using a read-only Agent Key for scraping Prometheus metrics to provide the minimal required level of access to your monitoring tools.

For more details, see our Secrets Overview reference documentation.

Last updated

Was this helpful?