Hosted Prometheus Endpoint
This page describes how to use WarpStream's hosted prometheus endpoint for collecting control plane metrics.
Overview
Almost all WarpStream metrics are exposed directly in the Agents, including "control plane" metrics that correspond to control plane metadata and not any particular Agent. This is accomplished via a background job that the control plane schedules to "push" control plane metrics to an individual Agent that will then emit it as a regular metric. Exposing control plane metrics this way is convenient, but it can sometimes be problematic due to the resulting cardinality.
For example, consumer group lag metrics are most useful when they're tagged by partition, but emitting consumer group metrics tagged by partition in the Agents makes the time series cardinality very high: O(m * n) where m is the number of topic-partitions and n is the number of the Agents.
As a result, WarpStream offers a hosted Prometheus endpoint that captures WarpStream's control plane metrics. This endpoint is authenticated and can be scraped by your monitoring system to collect some control plane metrics without incurring the additional cardinality of the Agent host / pod names.
Available Metrics
All Clusters
warpstream_control_plane_utilizationwarpstream_diagnostic_failurewarpstream_files_countwarpstream_topics_countwarpstream_topics_count_limitwarpstream_partitions_countwarpstream_partitions_count_limitwarpstream_agent_heartbeatwarpstream_agent_cpuwarpstream_agent_num_vcpus
Kafka Clusters
warpstream_consumer_group_statewarpstream_consumer_group_generation_idwarpstream_consumer_group_num_memberswarpstream_consumer_group_num_topicswarpstream_consumer_group_num_partitionswarpstream_consumer_group_max_offsetwarpstream_consumer_group_lagwarpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_secondswarpstream_consumer_group_commit_tswarpstream_produced_recordswarpstream_max_offsetwarpstream_min_offsetwarpstream_num_recordswarpstream_partition_size_uncompressed_byteswarpstream_partition_size_estimated_compressed_bytes
Tableflow Clusters
warpstream_tableflow_ingestion_lag_secondswarpstream_tableflow_partition_offset_lagwarpstream_tableflow_query_lag_secondswarpstream_tableflow_tables_countwarpstream_tableflow_files_countwarpstream_tableflow_snapshots_countwarpstream_tableflow_partitions_countwarpstream_tableflow_tables_limitwarpstream_tableflow_files_limitwarpstream_tableflow_snapshots_limitwarpstream_tableflow_partitions_limit
Schema Registry Clusters
warpstream_schema_versions_countwarpstream_schema_versions_limit
Sample Prometheus Scraping Configuration
CURLing Manually
An API Key can be obtained from the "API Keys" tab in the WarpStream console. For more details, see our API Keys reference documentation.
Disabling Metrics Publishing Job in the Agents
If you're scraping our Hosted Prometheus Endpoint, then you can disable the control plane metrics publishing job in the Agent as it will just be emitting duplicate metrics but with much higher cardinality due to the agent / pod level tags.
To disable this job, set the -disablePublishMetricsJob flag or WARPSTREAM_DISABLE_PUBLICS_METRICS_JOB=true environment variable on your Agent deployment.
Dedicated Metrics Agent
Configuring a Prometheus scraping endpoint is not always convenient, especially if you're using our Datadog integration. As a result, we also support running the Agents in a dedicated metrics mode where the Agent binary will do nothing but scrape the Hosted Prometheus Endpoint and publish those metrics itself so you can ingest them alongside all your other Agent metrics.
Agents running in metrics mode cannot process Kafka protocol messages, they will emit control plane metrics and do nothing else.
Running an Agent in metrics mode is as simple:
Optionally, you can enable the Datadog / statsd integration:
Like the regular Agent binary, you can also use environment variables instead:
If you deploy WarpStream on Kubernetes with our Helm chart, all you have to do to enable this feature is set the value of dedicatedMetricsPod.enabled in your values.yaml to true. This will deployed a single dedicated pod that scrapes the WarpStream Hosted Prometheus Endpoint and publishes those metrics itself. Note that the chart will automatically take care of disabling the metrics publishing job in the Agents when you do this.
API vs Agent Keys
You can use an account-level or workspace-level API key to scrape Prometheus metrics for any cluster which that key is authorized for, or you can use an Agent Key to scrape prometheus metrics for a single cluster.
In general, we recommend using a read-only Agent Key for scraping Prometheus metrics to provide the minimal required level of access to your monitoring tools.
For more details, see our Secrets Overview reference documentation.
Last updated
Was this helpful?