Important Metrics and Logs
On this page, we include a sample list the most important logs and metrics emitted by the Agents.
Before reading this documentation page, please familiarize yourself with how logs and metrics are emitted from the Agents.
A note on metrics vs. logs
The WarpStream Agents emit a wide variety of metrics that can be used to understand their performance, debugging, and alerting. In general, the Agents can be monitored solely using metrics.
However, in some cases it is more practical to emit analytical "wide events" (structured logs) instead of metrics. For that reason, some of the "metrics" described in this document are actually emitted as logs.
By default, these logs will be omitted by the Agents because they are verbose and could incur high costs with your observability provider. In addition, due to their highly structured nature, they can only practically be interpreted by a logging product with good support for aggregations and analytics. You can enable these "analytics" logs by setting this environment variable on the Agents: WARPSTREAM_LOG_LEVEL=analytics
Enable High-Cardinality Tags
Some metrics support the use of tags with potentially high cardinality, such as tags based on topics. This feature is disabled by default.
To enable high-cardinality tags:
Use the command-line flag
-kafkaHighCardinalityMetrics
.Alternatively, set the environment variable
WARPSTREAM_KAFKA_HIGH_CARDINALITY_METRICS=true
.
Tags that require enabling are clearly marked with "(requires enabling high-cardinality tags)" next to their name.
Overview
System performance metrics and logs, focusing on error detection and resource consumption.
[logs] Error Logs
query:
status:error
metric: *
[metrics] Memory usage by host
metric:
container.memory.usage
group_by:
host
[metrics] Used Cores
metric:
container.cpu.usage
group_by:
[metrics] Used Cores By Host
metric:
container.cpu.usage
group_by:
host
[metrics] Circuit Breaker (multiple metrics)
metric 1:
warpstream_circuit_breaker_open
: Records the number of times the circuit breaker is opened, tagged by the circuit breaker name to identify the associated operations.tags:
name
metric 2:
warpstream_circuit_breaker_close
: Records the number of times the circuit breaker is closed, tagged by the circuit breaker name to identify the associated operations.tags:
name
metric 3:
warpstream_circuit_breaker_halfopen
: Records the number of times the circuit breaker is half-opened, tagged by the circuit breaker name to identify the associated operations.tags:
name
metric 4:
warpstream_circuit_breaker_error
: Records the number of times the circuit breaker blocks an operation, tagged by the circuit breaker name to identify the associated operations.tags:
name
Kafka
Metrics and logs associated with the Kafka protocol provide insights into message handling, latency, and throughput.
[metrics] Kafka Inflight Connections
metric:
warpstream_agent_kafka_inflight_conn
group_by:
[metrics] Kafka Inflight Requests (per Connection)
metric:
warpstream_agent_kafka_inflight_request_per_connection
group_by:
[metrics] Kafka Latency
metric:
warpstream_agent_kafka_request_latency
group_by:
kafka_key
[metrics] Kafka Requests
metric:
warpstream_agent_kafka_request_outcome
group_by:
kafka_key,outcome
[metrics] Fetch Max Pointers in a Single Request
metric:
warpstream_agent_kafka_fetch_num_pointers_distribution
group_by:
[metrics] Fetch Partial Bytes Due to Errors
metric:
warpstream_agent_kafka_fetch_partial_response_error_scenario_num_bytes_distribution
group_by:
source
[metrics] Fetch Throughput (uncompressed bytes) (agent <v550)
metric:
warpstream_agent_kafka_fetch_bytes_sent
group_by:
[metrics] Fetch Throughput (uncompressed bytes) (agent >=v550)
metric:
warpstream_agent_kafka_fetch_uncompressed_bytes
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Fetch Throughput (compressed bytes) (agent >=v557)
metric:
warpstream_agent_kafka_fetch_compressed_bytes
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Fetch Throughput Count (compressed bytes) (agent >=v557)
metric:
warpstream_agent_kafka_fetch_compressed_bytes_counter
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Produce Throughput (uncompressed bytes) (agent <v550)
metric:
warpstream_agent_segment_batcher_flush_file_size
group_by:
[metrics] Produce Throughput (uncompressed bytes) (agent >=v550)
metric:
warpstream_agent_kafka_produce_uncompressed_bytes
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Produce Throughput (compressed bytes) (agent >=v557)
metric:
warpstream_agent_kafka_produce_compressed_bytes
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Produce Throughput Count (compressed bytes) (agent >=v557)
metric:
warpstream_agent_kafka_produce_compressed_bytes_counter
group_by:
topic
(requires enabling high-cardinality tags)
[metrics] Produce Throughput (records)
metric:
warpstream_agent_segment_batcher_flush_num_records_counter
group_by:
[metrics] Consumer Groups Lag
metric:
warpstream_consumer_group_lag
group_by:
virtual_cluster_id
,topic
,consumer_group
,partition
[metrics] Consumer Groups Max Offset
metric:
warpstream_consumer_group_max_offset
group_by:
virtual_cluster_id
,topic
,consumer_group
,partition
Background Jobs
Metrics and logs on efficiency and status of background operations, with a focus on compaction processes and the scanning of obsolete files.
[logs] Compaction File Output Size
query:
status:info
metric:
@stream_job_output.compaction.file_metadatas.index_offset
group_by:
source
,@stream_job_input.compaction.compaction_level
[logs] Compactions by Status and Level
query:
service:warp-agent @stream_job_input.type:COMPACTION_JOB_TYPE status:info
metric: *
group_by:
status
,@stream_job_input.compaction.compaction_level
[metrics] Compaction Files per Level (Indicator of Compaction Lag)
metric:
warpstream_files_count
group_by:
compaction_level
[logs] P99 Compaction Duration by Level
query:
service:warp-agent @stream_job_input.type:COMPACTION_JOB_TYPE status:info
metric:
@duration_ms
group_by:
status
,@stream_job_input.compaction.compaction_level
[metrics] Dead Files Scanner: Checked vs Deleted Files
metric:
warpstream_agent_deadscanner_outcomes
group_by:
outcome
[metrics] Executed Jobs
metric:
warpstream_agent_run_and_ack_job_outcome
group_by:
job_type
Object Storage
Metrics and logs on object storage operations' performance and usage patterns, offering insights into data retrieval, storage efficiency, and caching mechanisms.
[metrics] S3 Operations (GET)
warpstream_blob_store_operation_latency
filter_tag:
operation:get_stream
,operation:get_stream_range
group_by:
Note: this metric is a histogram, so even if it emits latency, you can count the number of items emitted and get the number of operations.
[metrics] S3 Operations (PUT)
warpstream_blob_store_operation_latency
filter_tag:
operation:put_bytes
,operation:put_stream
group_by:
Note: this metric is a histogram, so even if it emits latency, you can count the number of items emitted and get the number of operations.
[metrics] Bytes copied into cache
metric:
warpstream_agent_file_cache_server_get_range_copy_chunk_num_bytes_copied
group_by:
host
[metrics] Cache Size (Bytes)
metric:
warpstream_agent_file_cache_server_chunk_cache_curr_size_bytes
group_by:
host
[metrics] Cache Size (Entries)
metric:
warpstream_agent_file_cache_server_chunk_cache_num_entries
group_by:
host
[metrics] Direct or Remote Loads
metric:
warpstream_agent_file_cache_client_fetch_local_or_remote_counter
group_by:
source
[metrics] Fetch Pointers
metric:
warpstream_agent_kafka_fetch_num_pointers_counter
group_by:
[metrics] File Cache Bytes Transferred (Server)
metric:
warpstream_agent_file_cache_server_get_stream_range_num_bytes_count
group_by:
outcome
,host
File Cache Latency (Client)
metric:
warpstream_agent_file_cache_client_get_stream_range_latency
group_by:
outcome
[metrics] File Cache Latency (Server)
metric:
warpstream_agent_file_cache_server_get_stream_range_latency
group_by:
outcome
,host
[metrics] File Cache Outcomes (Client)
metric:
warpstream_agent_file_cache_client_get_stream_range_outcome
group_by:
outcome
[metrics] File Cache Outcomes (Server)
metric:
warpstream_agent_file_cache_server_get_stream_range_outcome
group_by:
outcome
,host
[metrics] File Cache Per Request Bytes Read Average (Server)
metric:
warpstream_agent_file_cache_server_get_stream_range_num_bytes_distribution
group_by:
outcome
[metrics] Num Bytes Fetched by Size
metric:
warpstream_agent_file_cache_server_fetch_size_num_bytes_counter
group_by:
fetch_size
[metrics] Num Fetches by Size
metric:
warpstream_agent_file_cache_server_fetch_size_counter
group_by:
fetch_size
Last updated