Monitor the WarpStream Agent

Logging

By default, the WarpStream Agent is configured to run with log level info . However, this can be changed with the WARPSTREAM_LOG_LEVEL environment variable. For example, if the info level logs are too noisy for you, you can set WARPSTREAM_LOG_LEVEL=warn.

The WarpStream Agents have an additional special log level called analytics that can be enabled by setting WARPSTREAM_LOG_LEVEL=analytics. This enables extremely detailed JSON logging that can loaded into a logging system that supports analytics to slice and dice Agent log events and obtain a deep understanding of the workload. However, this feature emits a lot of logs, so keep that in mind before enabling it.

Health Check

The Agent exposes an HTTP health check endpoint at $IP:8080/v1/status. A successful response is the string OK with a 200 status code.

Metrics

All WarpStream Agent metrics begin with the warpstream_agent prefix.

WarpStream agent metrics exposed via Prometheus will include the Prometheus namespace: warpstream, so metrics will include the aggregate prefix of: warpstream_warpstream_agent

Full List of Important Metrics/Logs

In Important Metrics and Logs you can find a full list of the most important metrics and logs for monitoring the agent.

Alerting on Metrics

In Recommended List of Alerts you will find a list of key metrics for which you should configure alerts to detect issues in your agent effectively.

Datadog

We recommend following the Datadog instructions for scraping Prometheus/OpenTelemetry metrics using the Datadog Agent. Configuration will vary from environment to environment, but you should end up with something like the following configuration (Kubernetes example):

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/warpstream-agent.checks: |
          {
            "openmetrics": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics",
                  "metrics": [".*"],
                  "send_distribution_buckets": true,
                  "collect_counters_with_distributions": true
                }
              ]
            }
          }

Which specifies that the Datadog Agent should scrape the WarpStream agent at port 8080 for metrics, and that it should scrape all the custom metrics that the WarpStream agent exposes.

We also have a pre-made Datadog Dashboard that you can just import directly using the import JSON feature.

Prometheus

The WarpStream agents expose a traditional Prometheus metrics endpoint that is enabled by default on port 8080.

Prometheus metrics will automatically be exposed on the Agent "internal port" which by default is the same as the Kinesis port which defaults to 8080.

If you set an explicit port override for the Kinesis port or the Agent "internal" port, then you'll need to update your Prometheus scrape configuration port as well.

All Prometheus metrics are exposed under the warpstream namespace (see the Metrics section above for more details).

Observability

The WarpStream Agent publishes the following metrics every minute that you can use to have insights on what is happening under the hood:

NameDescriptionTags

warpstream_consumer_group_lag

Difference (in offsets) between the max offset and the committed offset for every active consumer group.

Tagged by virtual_cluster_id, topic, consumer_group and partition.

warpstream_consumer_group_max_offset

Max offset of a given topic/partition/consumer group for every active consumer group.

Tagged by virtual_cluster_id, topic, consumer_group and partition.

warpstream_files_count

Number of files at each compaction level so that user's can monitor whether they are experiencing compaction lag.

Tagged by compaction_level (0, 1 or 2 for now)

Last updated

Logo

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. Kinesis is a trademark of Amazon Web Services.