Recommended List of Alerts
On this page we list the key metrics you should create alerts on.
Last updated
On this page we list the key metrics you should create alerts on.
Last updated
WarpStream was designed to minimize operational burden as much as possible. Therefore, the Agents are completely stateless and only depend on the underlying object store and the WarpStream control plane. The cloud provider manages and monitors the object store, and the WarpStream team manages and monitors the control plane.
For this reason, the WarpStream Agents have very little to alert on. However, if you want to configure additional alerts, you can review our "Important Metrics and Logs" section for a list of key metrics/logs that are good candidates for monitors. In general, instrumentation is more useful for debugging than alerting.
That said, we do recommend configuring a few alerts for resource usage.
It's important that the WarpStream Agents have sufficient capacity available to handle your workload and any potential spikes. For that reason, the most important thing to be alerted about with the WarpStream Agents is CPU and memory utilization.
We usually recommend keeping memory/cpu under 50%. Note that since the WarpStream Agents are stateless, it's safe to auto-scale them based on CPU usage.
CPU Usage
Metric: container.cpu.usage
Alert condition: >50
Memory Usage
Metric: container.memory.usage
Alert condition: >50
In addition to monitoring the resource utilization of the Agents, we also recommend monitoring your workload from your application. For example, tracking errors for producing and fetching data and monitoring your consumer group lag.
For convenience, the WarpStream Agents emit consumer group lag metrics once a minute. The metric can be found at: .
In addition, error rate and latencies for Kafka API operations can be monitored using these metrics emitted by the WarpStream Agents:
Error Rate on Kafka API
Metric: warpstream_agent_kafka_request_outcome
Filter by: outcome:error
Group by: kafka_key
Latency on Kafka API
Metric: warpstream_agent_kafka_request_latency
Group by: kafka_key
But generally, it's better to monitor them from your application than from the Agents.