# Hosted Prometheus Endpoint

## Overview

Almost all WarpStream metrics are exposed [directly in the Agents](https://docs.warpstream.com/warpstream/agent-setup/monitor-the-warpstream-agents), including "control plane" metrics that correspond to control plane metadata and not any particular Agent. This is accomplished via a background job that the control plane schedules to "push" control plane metrics to an individual Agent that will then emit it as a regular metric. Exposing control plane metrics this way is convenient, but it can sometimes be problematic due to the resulting cardinality.

For example, consumer group lag metrics are most useful when they're tagged by partition, but emitting consumer group metrics tagged by partition in the Agents makes the time series cardinality very high: `O(m * n)` where `m` is the number of topic-partitions and `n` is the number of the Agents.

As a result, WarpStream offers a hosted Prometheus endpoint that WarpStream's control plane metrics. This endpoint is authenticated and can be scraped by your monitoring system to collect some control plane metrics without incurring the additional cardinality of the Agent host / pod names.

## Available Metrics

### All Clusters

* `warpstream_control_plane_utilization`
* `warpstream_diagnostic_failure`
* `warpstream_files_count`
* `warpstream_topics_count`
* `warpstream_topics_count_limit`
* `warpstream_partitions_count`
* `warpstream_partitions_count_limit`

### Kafka Clusters

* `warpstream_consumer_group_state`
* `warpstream_consumer_group_generation_id`
* `warpstream_consumer_group_num_members`
* `warpstream_consumer_group_num_topics`
* `warpstream_consumer_group_num_partitions`
* `warpstream_consumer_group_max_offset`
* `warpstream_consumer_group_lag`
* `warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_seconds`
* `warpstream_consumer_group_commit_ts`
* `warpstream_produced_records`
* `warpstream_max_offset`
* `warpstream_min_offset`
* `warpstream_num_records`
* `warpstream_partition_size_uncompressed_bytes`
* `warpstream_partition_size_estimated_compressed_bytes`

### Tableflow Clusters

* `warpstream_tableflow_ingestion_lag_seconds`
* `warpstream_tableflow_partition_offset_lag`
* `warpstream_tableflow_query_lag_seconds`
* `warpstream_tableflow_tables_count`
* `warpstream_tableflow_files_count`
* `warpstream_tableflow_snapshots_count`
* `warpstream_tableflow_partitions_count`
* `warpstream_tableflow_tables_limit`
* `warpstream_tableflow_files_limit`
* `warpstream_tableflow_snapshots_limit`
* `warpstream_tableflow_partitions_limit`

### Schema Registry Clusters

* `warpstream_schema_versions_count`
* `warpstream_schema_versions_limit`

## Sample Prometheus Scraping Configuration

```yaml
scrape_configs:
  - job_name: "warpstream"
    static_configs:
      - targets: ["api.warpstream.com"]
    metrics_path: "api/v1/monitoring/prometheus/virtual_clusters/$VIRTUAL_CLUSTER_ID"
    scheme: "https"
    basic_auth:
      username: prometheus
      password: $API_KEY
```

## CURLing Manually

{% code overflow="wrap" %}

```bash
curl -u prometheus:$API_KEY "https://api.warpstream.com/api/v1/monitoring/prometheus/virtual_clusters/$VIRTUAL_CLUSTER_ID"
```

{% endcode %}

An API Key can be obtained from the "API Keys" tab in the WarpStream console. For more details, see our [API Keys reference documentation](https://docs.warpstream.com/warpstream/reference/api-reference/api-keys).

## Disabling Metrics Publishing Job in the Agents

If you're scraping our Hosted Prometheus Endpoint, then you can disable the control plane metrics publishing job in the Agent as it will just be emitting duplicate metrics but with much higher cardinality due to the agent / pod level tags.

To disable this job, set the `-disablePublishMetricsJob` flag or `WARPSTREAM_DISABLE_PUBLICS_METRICS_JOB=true` environment variable on your Agent deployment.

## Dedicated Metrics Agent

Configuring a Prometheus scraping endpoint is not always convenient, especially if you're using our Datadog integration. As a result, we also support running the Agents in a dedicated `metrics` mode where the Agent binary will do nothing but scrape the Hosted Prometheus Endpoint and publish those metrics itself so you can ingest them alongside all your other Agent metrics.

{% hint style="info" %}
Agents running in `metrics` mode cannot process Kafka protocol messages, they will emit control plane metrics and do nothing else.
{% endhint %}

Running an Agent in metrics mode is as simple:&#x20;

{% code overflow="wrap" %}

```bash
warpstream metrics -agentKey "$WARPSTREAM_AGENT_KEY" -defaultVirtualClusterID "$WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID"
```

{% endcode %}

Optionally, you can enable the Datadog / statsd integration:

{% code overflow="wrap" %}

```bash
warpstream metrics -agentKey "$WARPSTREAM_AGENT_KEY" -defaultVirtualClusterID "$WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID" -enableDatadogMetrics
```

{% endcode %}

Like the regular Agent binary, you can also use environment variables instead:

{% code overflow="wrap" %}

```bash
WARPSTREAM_AGENT_KEY=aks_XXXXX WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID=vci=XXXXX WARPSTREAM_ENABLE_DATADOG_METRICS=true warpstream metrics
```

{% endcode %}

If you deploy WarpStream on Kubernetes with our Helm chart, all you have to do to enable this feature is set the value of `dedicatedMetricsPod.enabled` in your `values.yaml` to `true`. This will deployed a single dedicated pod that scrapes the WarpStream Hosted Prometheus Endpoint and publishes those metrics itself. Note that the chart will automatically take care of [disabling the metrics publishing job](#disabling-metrics-publishing-job-in-the-agents) in the Agents when you do this.

## API vs Agent Keys

You can use an account-level or workspace-level API key to scrape Prometheus metrics for any cluster which that key is authorized for, or you can use an Agent Key to scrape prometheus metrics for a single cluster.

In general, we recommend using a read-only Agent Key for scraping Prometheus metrics to provide the minimal required level of access to your monitoring tools.

For more details, see our [Secrets Overview](https://docs.warpstream.com/warpstream/reference/secrets-overview) reference documentation.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.warpstream.com/warpstream/agent-setup/monitor-the-warpstream-agents/hosted-prometheus-endpoint.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
