Monitor the WarpStream Agent
Logging
By default, the WarpStream Agent is configured to run with log level info
. However, this can be changed with the WARPSTREAM_LOG_LEVEL
environment variable. For example, if the info
level logs are too noisy for you, you can set WARPSTREAM_LOG_LEVEL=warn
.
The WarpStream Agents have an additional special log level called analytics
that can be enabled by setting WARPSTREAM_LOG_LEVEL=analytics
. This enables extremely detailed JSON logging that can loaded into a logging system that supports analytics to slice and dice Agent log events and obtain a deep understanding of the workload. However, this feature emits a lot of logs, so keep that in mind before enabling it.
Health Check
The Agent exposes an HTTP health check endpoint at $IP:8080/v1/status
. A successful response is the string OK
with a 200
status code.
Metrics
All WarpStream Agent metrics begin with the warpstream_agent
prefix.
WarpStream agent metrics exposed via Prometheus will include the Prometheus namespace: warpstream
, so metrics will include the aggregate prefix of: warpstream_warpstream_agent
Full List of Important Metrics/Logs
In Important Metrics and Logs you can find a full list of the most important metrics and logs for monitoring the agent.
Alerting on Metrics
In Recommended List of Alerts you will find a list of key metrics for which you should configure alerts to detect issues in your agent effectively.
Datadog
We recommend following the Datadog instructions for scraping Prometheus/OpenTelemetry metrics using the Datadog Agent. Configuration will vary from environment to environment, but you should end up with something like the following configuration (Kubernetes example):
Which specifies that the Datadog Agent should scrape the WarpStream agent at port 8080
for metrics, and that it should scrape all the custom metrics that the WarpStream agent exposes.
We also have a pre-made Datadog Dashboard that you can just import directly using the import JSON feature.
Prometheus
The WarpStream agents expose a traditional Prometheus metrics endpoint that is enabled by default on port 8080
.
Prometheus metrics will automatically be exposed on the Agent "internal port" which by default is the same as the Kinesis port which defaults to 8080
.
If you set an explicit port override for the Kinesis port or the Agent "internal" port, then you'll need to update your Prometheus scrape configuration port as well.
All Prometheus metrics are exposed under the warpstream
namespace (see the Metrics section above for more details).
Observability
The WarpStream Agent publishes the following metrics every minute that you can use to have insights on what is happening under the hood:
Name | Description | Tags |
---|---|---|
| Difference (in offsets) between the max offset and the committed offset for every active consumer group. | Tagged by |
| Max offset of a given topic/partition/consumer group for every active consumer group. | Tagged by |
| Number of files at each compaction level so that user's can monitor whether they are experiencing compaction lag. | Tagged by |
Last updated