Important Metrics and Logs

On this page, we include a sample list the most important logs and metrics emitted by the Agents.

Before reading this documentation page, please familiarize yourself with how logs and metrics are emitted from the Agents.

A note on metrics vs. logs

The WarpStream Agents emit a wide variety of metrics that can be used to understand their performance, debugging, and alerting. In general, the Agents can be monitored solely using metrics.

However, in some cases it is more practical to emit analytical "wide events" (structured logs) instead of metrics. For that reason, some of the "metrics" described in this document are actually emitted as logs.

By default, these logs will be omitted by the Agents because they are verbose and could incur high costs with your observability provider. In addition, due to their highly structured nature, they can only practically be interpreted by a logging product with good support for aggregations and analytics. You can enable these "analytics" logs by setting this environment variable on the Agents: WARPSTREAM_LOG_LEVEL=analytics

Enable High-Cardinality Tags

Some metrics support the use of tags with potentially high cardinality, such as tags based on topics. This feature is disabled by default.

To enable high-cardinality tags:

  • Use the command-line flag -kafkaHighCardinalityMetrics.

  • Alternatively, set the environment variable WARPSTREAM_KAFKA_HIGH_CARDINALITY_METRICS=true.

Tags that require enabling are clearly marked with "(requires enabling high-cardinality tags)" next to their name.

Overview

System performance metrics and logs, focusing on error detection and resource consumption.

  • [logs] Error Logs

    • query: status:error

    • metric: *

  • [metrics] Memory usage by host

    • metric: container.memory.usage

    • group_by: host

  • [metrics] Used Cores

    • metric: container.cpu.usage

    • group_by:

  • [metrics] Used Cores By Host

    • metric: container.cpu.usage

    • group_by: host

  • [metrics] Circuit Breaker (multiple metrics)

    • metric 1:

      • warpstream_circuit_breaker_open: Records the number of times the circuit breaker is opened, tagged by the circuit breaker name to identify the associated operations.

      • tags: name

    • metric 2:

      • warpstream_circuit_breaker_close: Records the number of times the circuit breaker is closed, tagged by the circuit breaker name to identify the associated operations.

      • tags: name

    • metric 3:

      • warpstream_circuit_breaker_halfopen: Records the number of times the circuit breaker is half-opened, tagged by the circuit breaker name to identify the associated operations.

      • tags: name

    • metric 4:

      • warpstream_circuit_breaker_error: Records the number of times the circuit breaker blocks an operation, tagged by the circuit breaker name to identify the associated operations.

      • tags: name

Kafka

Metrics and logs associated with the Kafka protocol provide insights into message handling, latency, and throughput.

  • [metrics] Kafka Inflight Connections

    • metric: warpstream_agent_kafka_inflight_conn

    • group_by:

  • [metrics] Kafka Inflight Requests (per Connection)

    • metric: warpstream_agent_kafka_inflight_request_per_connection

    • group_by:

  • [metrics] Kafka Latency

    • metric: warpstream_agent_kafka_request_latency

    • group_by: kafka_key

  • [metrics] Kafka Requests

    • metric: warpstream_agent_kafka_request_outcome

    • group_by: kafka_key,outcome

  • [metrics] Fetch Max Pointers in a Single Request

    • metric: warpstream_agent_kafka_fetch_num_pointers_distribution

    • group_by:

  • [metrics] Fetch Partial Bytes Due to Errors

    • metric: warpstream_agent_kafka_fetch_partial_response_error_scenario_num_bytes_distribution

    • group_by: source

  • [metrics] Fetch Throughput (uncompressed bytes) (agent <v550)

    • metric: warpstream_agent_kafka_fetch_bytes_sent

    • group_by:

  • [metrics] Fetch Throughput (uncompressed bytes) (agent >=v550)

    • metric: warpstream_agent_kafka_fetch_uncompressed_bytes

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Fetch Throughput (compressed bytes) (agent >=v557)

    • metric: warpstream_agent_kafka_fetch_compressed_bytes

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Fetch Throughput Count (compressed bytes) (agent >=v557)

    • metric: warpstream_agent_kafka_fetch_compressed_bytes_counter

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Produce Throughput (uncompressed bytes) (agent <v550)

    • metric: warpstream_agent_segment_batcher_flush_file_size

    • group_by:

  • [metrics] Produce Throughput (uncompressed bytes) (agent >=v550)

    • metric: warpstream_agent_kafka_produce_uncompressed_bytes

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Produce Throughput (compressed bytes) (agent >=v557)

    • metric: warpstream_agent_kafka_produce_compressed_bytes

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Produce Throughput Count (compressed bytes) (agent >=v557)

    • metric: warpstream_agent_kafka_produce_compressed_bytes_counter

    • group_by: topic (requires enabling high-cardinality tags)

  • [metrics] Produce Throughput (records)

    • metric: warpstream_agent_segment_batcher_flush_num_records_counter

    • group_by:

  • [metrics] Consumer Groups Lag

    • metric: warpstream_consumer_group_lag

    • group_by: virtual_cluster_id, topic, consumer_group, partition

  • [metrics] Consumer Groups Max Offset

    • metric: warpstream_consumer_group_max_offset

    • group_by: virtual_cluster_id, topic, consumer_group,partition

Background Jobs

Metrics and logs on efficiency and status of background operations, with a focus on compaction processes and the scanning of obsolete files.

  • [logs] Compaction File Output Size

    • query: status:info

    • metric: @stream_job_output.compaction.file_metadatas.index_offset

    • group_by: source,@stream_job_input.compaction.compaction_level

  • [logs] Compactions by Status and Level

    • query: service:warp-agent @stream_job_input.type:COMPACTION_JOB_TYPE status:info

    • metric: *

    • group_by: status,@stream_job_input.compaction.compaction_level

  • [metrics] Compaction Files per Level (Indicator of Compaction Lag)

    • metric: warpstream_files_count

    • group_by: compaction_level

  • [logs] P99 Compaction Duration by Level

    • query: service:warp-agent @stream_job_input.type:COMPACTION_JOB_TYPE status:info

    • metric: @duration_ms

    • group_by: status,@stream_job_input.compaction.compaction_level

  • [metrics] Dead Files Scanner: Checked vs Deleted Files

    • metric: warpstream_agent_deadscanner_outcomes

    • group_by: outcome

  • [metrics] Executed Jobs

    • metric: warpstream_agent_run_and_ack_job_outcome

    • group_by: job_type

Object Storage

Metrics and logs on object storage operations' performance and usage patterns, offering insights into data retrieval, storage efficiency, and caching mechanisms.

  • [metrics] S3 Operations (GET)

    • warpstream_blob_store_operation_latency

      • filter_tag: operation:get_stream, operation:get_stream_range

      • group_by:

      Note: this metric is a histogram, so even if it emits latency, you can count the number of items emitted and get the number of operations.

  • [metrics] S3 Operations (PUT)

    • warpstream_blob_store_operation_latency

      • filter_tag: operation:put_bytes, operation:put_stream

      • group_by:

      Note: this metric is a histogram, so even if it emits latency, you can count the number of items emitted and get the number of operations.

  • [metrics] Bytes copied into cache

    • metric: warpstream_agent_file_cache_server_get_range_copy_chunk_num_bytes_copied

    • group_by: host

  • [metrics] Cache Size (Bytes)

    • metric: warpstream_agent_file_cache_server_chunk_cache_curr_size_bytes

    • group_by: host

  • [metrics] Cache Size (Entries)

    • metric: warpstream_agent_file_cache_server_chunk_cache_num_entries

    • group_by: host

  • [metrics] Direct or Remote Loads

    • metric: warpstream_agent_file_cache_client_fetch_local_or_remote_counter

    • group_by: source

  • [metrics] Fetch Pointers

    • metric: warpstream_agent_kafka_fetch_num_pointers_counter

    • group_by:

  • [metrics] File Cache Bytes Transferred (Server)

    • metric: warpstream_agent_file_cache_server_get_stream_range_num_bytes_count

    • group_by: outcome,host

  • File Cache Latency (Client)

    • metric: warpstream_agent_file_cache_client_get_stream_range_latency

    • group_by: outcome

  • [metrics] File Cache Latency (Server)

    • metric: warpstream_agent_file_cache_server_get_stream_range_latency

    • group_by: outcome, host

  • [metrics] File Cache Outcomes (Client)

    • metric: warpstream_agent_file_cache_client_get_stream_range_outcome

    • group_by: outcome

  • [metrics] File Cache Outcomes (Server)

    • metric: warpstream_agent_file_cache_server_get_stream_range_outcome

    • group_by: outcome, host

  • [metrics] File Cache Per Request Bytes Read Average (Server)

    • metric: warpstream_agent_file_cache_server_get_stream_range_num_bytes_distribution

    • group_by: outcome

  • [metrics] Num Bytes Fetched by Size

    • metric: warpstream_agent_file_cache_server_fetch_size_num_bytes_counter

    • group_by: fetch_size

  • [metrics] Num Fetches by Size

    • metric: warpstream_agent_file_cache_server_fetch_size_counter

    • group_by: fetch_size

Last updated

Logo

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. Kinesis is a trademark of Amazon Web Services.