Change Log
Contains a history of changes made to the Agent.
Release v612
Jan 16th, 2025
Added flags to support Java Keystores and Truststores to make migration from Kafka easier
tlsJavaKeystoreFile
,tlsJavaKeystorePasword
,tlsJavaKeystoreKeyPasword
,tlsJavaTruststoreFile
,tlsJavaTruststorePasword
.Java Keystore passwords are not used to encrypt the keystore, only to confirm it's integrity. In most Java implementations the keystore password and private key password must be the same which negates any encryption advantages. We recommend using the same security controls when using Java Keystores that you would use with unencrypted certificate keys.
Added support for using IMDSv2 to configure
DD_AGENT_HOST
for Agents running in AWS. If v2 is not available we automatically fall back to v1.Improved schema validation error message to return more detailed reason for why invalid record is rejected.
Added the
-schemaValidationURL
flag for schema registry to replace the deprecated-schemaRegistryURL
flag (still supported).
Release v611
Jan 10th, 2025
Add
tlsServerPrivateKeyPasswordFile
flag to enable using encrypted TLS private keys.The encrypted key given via
tlsServerPrivateKeyFile
must be inPKCS#8
format.The password file must contain a single line which is the private key's password.
Honor
DD_AGENT_HOST
environment variable if its already set.Allow TLS to be used with Orbit even if SASL is not configured.
Add support for Apache Kafka transactions. Read our announcement at https://www.warpstream.com/blog/kafka-transactions-explained-twice.
Release v610
Jan 8th, 2025
Basic Authentication Support for BYOC Schema Registry:
Add
schemaRegistryBasicAuth
flag to enable basic authentication for schema registry.
Support the
GET /subjects/(string: subject)/versions
endpoint in schema registry to allow getting a list of versions registered under the specified subject.Bump golang.org/x/net dependency for CVE-2024-45338 vulnerability.
Release v609
Dec 19th, 2024
Add
-requireSASLAuthentication
and-requireMTLSAuthentication
to playground mode. They where missed when these flags where added to the agent inv549
.
Release v608
Dec 18th, 2024
Add support for using DynamoDB as the agent's backing store alongside S3 and S3 Express One Zone.
Release v607
Dec 18th, 2024
Fix an issue where tombstones could be incorrectly handled, and showed as messages with empty values.
Fix a rare issue where the agent would fail to connect to a S3 bucket on startup in some configurations.
Release v606
Dec 11th, 2024
Make Orbit much more efficient and able to achieve higher throughput more easily.
Double the ratelimit for the maximum inflight fetch compressed bytes per core (now that fetch is significantly more efficient).
Make bento logging less noisy by limiting each managed pipeline to one bento log per second or less.
Release v605
Dec 10th, 2024
Avoid noisy error logs when instance type discovery fails.
Fix macOS Agent binaries to not segfault about a missing library
Release v604
Dec 9nd, 2024
Add the ability to load the agent key from a file using the
-agentKeyPath
flag orWARPSTREAM_AGENT_KEY_PATH
environment variable. The agent will reload the file when the file is updated so the key can be changed without restarting the agent.Upgrade to latest Bento which adds ability to ratelimit pipelines based on bytes instead of messages, and improving GCP bigquery output logging.
Switch to C++ library for lz4 compression instead of pure Go.
Release v603
Dec 2nd, 2024
Preserves the original DN from the TLS cert to be used as the ACL principal. Previously, while the DN was semantically correct, the order of its elements wasn't necessarily preserved. This leads to the principal set during auth to not match the principal set when the ACL is created.
Fix a log in the managed data pipelines feature that was logging a cluster-specific credential.
Add support for SASL SCRAM 256/512 as an authentication mechanism for source clusters in Orbit.
Release v602
Nov 25th, 2024
Retry object storage permissions check up to 3 times on startup to avoid false positives related to dial timeouts
Added BYOC Schema Registry Support:
The agent now supports hosting a BYOC Schema Registry.
Use the
-schemaRegistryPort
flag (or theWARPSTREAM_SCHEMA_REGISTRY_PORT
env variable) to specify the port (default 9094) to run the schema registry server on.Use the
-schemaRegistryTLS
flag (or theWARPSTREAM_SCHEMA_REGISTRY_TLS_ENABLED
env variable) to enable tls over the schema registry server.New metrics:
warpstream_agent_schema_registry_inflight_connections
: number of currently inflight / active connections to schema registry server.tags:
schema_registry_operation
,outcome
warpstream_agent_schema_registry_request_latency
: latency (seconds) for processing each Schema registry request.tags:
schema_registry_operation
,outcome
warpstream_agent_schema_registry_outcome
: outcome (success, error, etc) for each Schema Registry request.tags:
schema_registry_operation
,outcome
warpstream_agent_schema_registry_request_bytes_counter
: number of bytes of incoming requests.tags:
schema_registry_operation
warpstream_agent_schema_registry_response_bytes_counter
: number of bytes of outgoing response.tags:
schema_registry_operation
,outcome
warpstream_schema_versions_count
: Total number of schema versions in the schema registry cluster.warpstream_schema_versions_limit
: Maximum number of schema versions allowed in the cluster.
Release v601
Nov 21st, 2024
Limit maximum number of inflight bytes for fetch as compressed instead of uncompressed, and do it before issuing the fetch requests. This prevents excessive read amplifications when the Agents are highly loaded.
Switch from offheap to onheap cache for fileache for better eviction policy.
Fix a bug that was preventing the Agents from loading large IOs directly from object storage leading to excessive cache churn.
Remove some prefetcher restrictions to improve the performance of large fetch requests with many topic-partitions.
Release v600
Nov 20th, 2024
Fixed DescribeCluster API to correctly override hostname
Fixed a bug where the demo/playground agent was incorrectly using the environment variable.
Release v599
Nov 20th, 2024
Only restart Bento pipelines when their deployed configurations are modified. Previously, modifying one configuration restarted all deployed pipelines.
Increase default timeout for loading data into the file cache from 5s to 15s. This has no impact on tail latencies for healthy clusters due to the fast retry mechanism, but dramatically improves the ability of overloaded clusters to make progress.
Cap number of speculative/fast retries to 2% at most. This prevents excessive read amplification in some scenarios where the Agents are overloaded.
Upgrade Agents to go 1.23.3 to eliminate contention in HTTP client.
Release v598
Nov 6th, 2024
Removed noisy error log triggered when producing with ACKS=0. Previously, clients and agents generated a fake invariant error: 'Previous sent message is not current sequence -1' due to an unordered sequence number.
Allow Agents to create ingestion files with up to 64MiB of uncompressed data, increased from 16 MiB.
Orbit encrypts communication with source clusters using TLS.
Release v597
Nov 6th, 2024
Default Agent Group: When the
-agentGroup
parameter is not explicitly set, the agent will now default to a group named 'default'. Previously, if a Kafka client connects to an agent without a group specified, it could end up connecting to agents in other groups. With this change, clients connected to an agent with no group specified will now only see agents in the 'default' group.Note: This change is backwards compatible; during rollout, agents with no group set and those in the 'default' group will be treated as a single group.
Add support for SASL handshake v0.
Release v596
Nov 1st, 2024
Add delay + jitter + randomization to the order/timing in which Bento pipelines are started to avoid pipelines synchronizing with each other and doing all of their work at the same time.
Add support for Orbit to the Agents. See: https://docs.warpstream.com/warpstream/byoc/orbit
Release v595
October 30th, 2024
Return success + offsets/timestamps instead of DuplicateSequenceError + offsets/timestamps when a duplicate batch is detected via the idempotent producer functionality to mirror Kafka's behavior.
Upgrade to latest version of Bento with more improvements for parquet encoder (bug fixes + support for int8/int16).
Release v594
October 28th, 2024
Upgrade to latest version of Bento with improved handling of decimals and floats in parquet encoder.
Release v593
October 27th, 2024
Fixes metrics called "warpstream_agent_segment_batcher_flush_xxx" who were reporting their successes with "flush_cause:buffer_full" as errors.
Emit Bento metrics in Agents when managed data pipelines is enabled. All Bento metrics will be prefixed with:
warpstream_bento
.Upgrade to latest version of Bento with ability to encode maps/lists in the parquet encoder.
Release v592
October 23nd, 2024
Upgrade to latest Bento version, which improves error logging for the GCP BigQuery output.
Release v591
October 22nd, 2024
Add support for isolating managed data pipelines to different groups of pipelines Agents using the
managedPipelinesGroupName
flag andWARPSTREAM_MANAGED_PIPELINES_GROUP_NAME
environment variable. Read the docsMake manage data pipelines product work automatically, even when the WarpStream Agents are advertising the hostname of a load balancer to the Kafka protocol by making the Kafka metadata handler always return the Agents actual IP addresses to the managed data pipelines product.
Upgrade to latest Bento version which adds support for interpolating table name in GCP BigQuery output, as well as support for GCP's new Storage Write API for BigQuery. Also adds support for writing to Redshift clusters in AWS.
Release v590
October 17th, 2024
Make the Agents use the pure GO DNS resolver to work around a concurrency issue with
setenv
andgetenv
.Agents will now automatically tune their GC and backpressure settings automatically based on which roles are configured.
proxy-consume
Agents will have larger file caches,proxy-produce
Agents will buffer more data before backpressuring,pipelines
Agents will GC less aggressively, etc
Release v589
October 16th, 2024
Add fetch limits auto-tuning to the Agents, which automatically adjusts fetch request size limits when the Agents deem necessary. To enable this feature, use the flag
-autoTuneFetchLimits
or the env variableWARPSTREAM_AUTO_TUNE_FETCH_LIMITS
. By default this is disabled.
Release v588
October 11th, 2024
Agents configured with specific roles will now reject requests that they're not supposed to handle. For example, proxy-consume Agents will reject Fetch requests and proxy-produce Agents will reject Produce requests.
WarpStream Agents will now automatically avoid communicating with each other if multiple clusters are deployed in the same VPC, even if I.P addresses are quickly recycled between the two clusters (due to high container churn).
Increased default batching interval for Metadata requests to 250ms.
Upgrade managed data pipelines to latest version of Bento, and automatically tune franz_kafka blocks to fetch data from source Kafka/WarpStream clusters faster.
Release v587
September 27th, 2024
Increase the default value of GOMEMLIMIT from 1GiB/vCPU to 1.5GiB/vCPU. This should result in less GC overhead for highly loaded Agents.
Automatically disable profile forwarding if Datadog profiling is turned on.
Release v586
September 25th, 2024
Change the default value of
disableProfileForwarding
/WARPSTREAM_DISABLE_PROFILE_FORWARDING
to false, which means that the Agents will start forwarding profiles to WarpStream's control plane.
Release v585
September 24th, 2024
Add support for scheduling blocks in managed data pipelines
Release v584
September 23rd, 2024
Add support for AWS Glue Schema Registry schema validation
Users can now use schemas stored in their AWS Glue Schema Registry to validate records.
Add new topic-level configuration
warpstream.schema.registry.type
to specify what type of schema registry the agent should fetch the remote schemas from. Supported values include"STANDARD"
and"AWS_GLUE"
. Defaults to"STANDARD"
Add
-zonedCIDRBlocks
flag andWARPSTREAM_ZONED_CIDR_BLOCKS
env variable as an alternative way to provide information on the Kafka client's availability zone. The value is a mapping of availability zones to Kafka client IPs and should be a <> delimited list of AZ to CIDR range pairs, where each pair starts with an AZ, a @, and a comma separated list of CIDR blocks for that given AZ. For example,us-east-1a@10.0.0.0/19,10.0.32.0/19<>us-east-1b@10.0.64.0/19<>us-east-1c@10.0.96.0/19
.
Release v583
September 18th, 2024
Bump the otel dependency to fix spurious error logs introduced in v582
Release v582
September 18th, 2024
Increase default clean shutdown interval from 60s to 80s.
Increase the max number of connections per CPU (from 2048 to 4096)
Release v581
September 12th, 2024
Bump build to go
1.22.7
to fix some stdlib vulnerabilities
Release v580
September 11th, 2024
Add agent profile forwarding to WarpStream's control plane. Note that this feature cannot be enabled together with Datadog profiling and is not supported for self-hosted control planes. The following flags are added for this feature.
disableProfileForwarding
/WARPSTREAM_DISABLE_PROFILE_FORWARDING
: this flag can be used to disable profile forwarding. By default this is set to true and no profiles will be forwarded.maxProfileSize
/WARPSTREAM_MAX_PROFILE_SIZE
: max number of bytes allowed for buffering profiles in memory.
Bump our Alpine base image to
3.20.3
to fix some vulnerabilities
Release v579
September 9th, 2024
Add
-maxProduceRecordSizeBytes
flag andWARPSTREAM_MAX_PRODUCE_RECORD_SIZE_BYTES
env variable to override the maximum uncompressed size of a record that can be produced.
Release v578
August 30th, 2024
Add "ratelimited" outcome for S3-backed blob storage metrics
warpstream_blob_store_*
Increased maximum allowed ingestion file size from 8MiB to 16MiB
Improve buckets for distribution metrics for
warpstream_agent_segment_batcher_flush_file_size_uncompressed_bytes
andwarpstream_agent_segment_batcher_flush_file_size_compressed_bytes
Fixed a bug where we returned the wrong error code when a produced record was larger than the maximum allowed size (32MiB)
Release v577
August 25th, 2024
Fix bug in backpressure system that would cause Agents to get stuck in backpressure state.
Release v576
August 22nd, 2024
Fix bug affecting librdkafka library's cooperative rebalance, when adding more clients to a consumer group.
Add new startup flags to enable mutex and block profiling:
-enableSetMutexProfileFraction
/WARPSTREAM_ENABLE_SET_MUTEX_PROFILE_FRACTION
: enable this flag to call 'runtime.SetMutexProfileFraction' with the value passed along -mutexProfileFraction.-mutexProfileFraction
/WARPSTREAM_MUTEX_PROFILE_FRACTION
: tune the value passed to call 'runtime.SetMutexProfileFraction-enableSetBlockProfileRate
/WARPSTREAM_ENABLE_SET_BLOCK_PROFILE_RATE
: enable this flag to call 'runtime.SetBlockProfileRate' with the value passed along -blockProfileRate-blockProfileRate
/WARPSTREAM_BLOCK_PROFILE_RATE
: tune the value passed to call 'runtime.SetBlockProfileRate'
Release v575
August 18th, 2024
Report CPU usage using a moving average instead of point-in-time to improve accuracy.
Improve performance when a cluster has many clients polling partitions that are written to infrequently.
Release v574
August 15th, 2024
Increase default limits for backpressuring produce requests by 400%
Release v573
August 14th, 2024
Disable automatic retries in underlying blob storage clients (AWS / GCP) so we can control blob storage retries at the application layer.
Switch to C++ library for ZSTD instead of pure Go (2-3x faster).
Release v572
August 6th, 2024
Improved the Agents ability to backpressure Produce requests.
Added back-pressure support for Fetch requests in addition to Produce requests.
Release v571
August 2nd, 2024
Improved Agent backpressuring for Produce requests so that Agents will begin throttling Produce request and individual connections when they have too much producer data buffered in memory instead of OOMing.
Agents now return
KafkaStorageError
instead ofRequestTimedOut
for some timeout-related errors in the Fetch path. This improves behavior with Java consumer clients which treatKafkaStorageError
as retriable, but notRequestedTimedOut
.Fixed a bug in
demo
/playground
mode where the requested hostname strategy was not being used.Tagged all Agent metrics with the
agent_roles
,agent_group
, andvirtual_cluster_id
tags to make debugging advanced deloyments easier.
Release v570
July 17th, 2024
Fixed a bug that would cause idempotent producer out of sequence errors when idempotent producer was enabled on some clients. Data was never committed out of order, but the previous implementation would force the client to retry more than necessary, resulting in high latency or reduced throughput
Added Schema Validation Support:
Check out documentation for more details
Release v569
July 5th, 2024
Add a new
warpstream_max_offset
metric tagged by topic/partition (that can be controlled with the existingdisableConsumerGroupsMetricsTags
flag).The existing
warpstream_consumer_group_max_offset
metric is deprecated as it shares the same value across consumer groups and it can be replaced with the new metric mentioned above.Add the
topic
label/tag towarpstream_agent_kafka_produce_uncompressed_bytes_counter
that was missing it.
Release v568
July 3rd, 2024
Various small performance improvements (batching, allocations, etc).
Batch Metadata and FindCoordinator request RPCs between the Agents and the control plane. Helpful for workloads with a large number of consumer/producer clients.
Release v567
June 19th, 2024
Allow individual fetch requests to fetch up to 128MiB of uncompressed data per topic-partition, per fetch request, increased from 32 MiB.
Add
-kafkaMaxFetchRequestBytesUncompressedOverride
flag andWARPSTREAM_KAFKA_MAX_FETCH_REQUEST_BYTES_UNCOMPRESSED_OVERRIDE
env variable to override maximum number of uncompressed bytes that can be fetched in a single fetch request.Add
-kafkaMaxFetchPartitionBytesUncompressedOverride
andWARPSTREAM_KAFKA_MAX_FETCH_PARTITION_BYTES_UNCOMPRESSED_OVERRIDE
env variable to override maximum number of uncompressed bytes that can be fetched for a single topic-partition in a single fetch request.Introduced the metric
warpstream_consumer_group_generation_id
with the tagsconsumer_group
. This metric indicates the generation number of the consumer group, incrementing by one with each rebalance. It serves as an effective indicator for detecting occurrences of rebalances.Added metric
warpstream_consumer_group_estimated_lag_very_coarse_do_not_use_to_measure_e2e_seconds
(tags:consumer_group
,topic
,partition
). Provides an estimated lag in seconds (equivalent towarpstream_consumer_group_lag
but in time units instead of offsets), calculated via interpolation/extrapolation. Caution: Very coarse; unsuitable for precise end-to-end latency measurement.Renamed Flags and Environment Variables:
benthosBucketURL
(WARPSTREAM_BENTHOS_BUCKET_URL
) is nowbentoBucketURL
(WARPSTREAM_BENTO_BUCKET_URL
): Bucket URL to use when fetching the Bento configuration.benthosConfigPath
(WARPSTREAM_BENTHOS_CONFIG_PATH
) is nowbentoConfigPath
(WARPSTREAM_BENTO_CONFIG_PATH
): Path in the bucket to fetch the Bento configuration.
Release v566
June 5th, 2024
Fix broken formatting of logs that were sometimes
logfmt
when they should've been JSON.
Release v565
June 5th, 2024
Improve the Agent's ability to handle certain pathological compactions where an input stream is not read from for so long that the object store closes the connection. The Agent will now detect this scenario and issue a new GET request to resume the compaction instead of failing it entirely and never making progress.
Release v564
May 26th, 2024
Improve performance of file cache for high throughput workloads by improving how batching is handled.
Add support for more Benthos components:
[awk, jsonpath, lang, msgpack, parquet, protobuf, xml, zstd]
.Fix rare memory leak.
Fix rare bug that would cause circuit breakeres to get stuck in open / half-open state forever.
Release v563
May 22nd, 2024
Fixed a bug in the Agent that prevented users from starting playgrounds or demos.
Release v562
May 19th, 2024
Agent
Make Agent batch size configurable using
-batchMaxSizeBytes
flag andWARPSTREAM_BATCH_MAX_SIZE_BYTES
environment variable.Metrics
Add new Agent metric (gauge) that indicates state of consumer group:
warpstream_consumer_group_state
. Metric has two tags:consumer_group
andgroup_state
.Add new Agent metric (gauge) that indicates the number of members in each consumer group:
warpstream_consumer_group_num_members
. Metric has one primary tag:consumer_group
.Add new Agent metric (gauge) that indicates the number of topics in each consumer group:
warpstream_consumer_group_num_topics
. Metric has one primary tag:consumer_group
.Add new Agent metric (gauge) that indicates the number of partitions in each consumer group:
warpstream_consumer_group_num_partitions
. Metric has one primary tag:consumer_group
.Add new Agent metric (gauge) that indicates the total number of topics in the cluster:
warpstream_topic_count
.Add new Agent metric (gauge) that indicates the limit for the total number of topics in the cluster:
warpstream_topic_count_limit
.Add new Agent metric (gauge) that indicates the total number of partitions in the cluster:
warpstream_partition_count
.Add new Agent metric (gauge) that indicates the limit for the total number of partitions in the cluster:
warpstream_partition_count_limit
.
Release v559
May 13th, 2024
Agent
Add support for WarpStream Managed Data Pipelines
Change the default number of uncompressed bytes that will be be buffered before flushing a file to object storage from 8MiB to 4MiB
Release v558
May 10th, 2024
Agent
Support adopting AWS roles for providing access to S3 via environment variables instead of a query argument in the bucket URL.
Tuned circuit breakers to be less aggressive by default, and configured the retry mechanisms to avoid retrying circuit breaker errors.
New Metrics
warpstream_circuit_breaker_open
: Records the number of times the circuit breaker is opened, tagged by the circuit breaker name to identify the associated operations. - tags:name
warpstream_circuit_breaker_close
: Records the number of times the circuit breaker is closed, tagged by the circuit breaker name to identify the associated operations.tags:
name
warpstream_circuit_breaker_halfopen
: Records the number of times the circuit breaker is half-opened, tagged by the circuit breaker name to identify the associated operations.tags:
name
warpstream_circuit_breaker_error
: Records the number of times the circuit breaker blocks an operation, tagged by the circuit breaker name to identify the associated operations.tags:
name
Release v557
May 9th, 2024
Agent
Enhanced Buckets: Refined the buckets for
warpstream_agent_kafka_fetch_uncompressed_bytes
,warpstream_agent_kafka_fetch_compressed_bytes
,warpstream_agent_kafka_produce_compressed_bytes
andwarpstream_agent_kafka_produce_uncompressed_bytes
to ensure they are more precise. The previous bucket configuration was too sparse.Fixed Prometheus Counters: Resolved an issue with Prometheus counters previously were not appropriately incrementing the value.
Release v556
May 9th, 2024
Agent
Tune circuit breakers to be less aggressive, and don't retry requests that failed due to a circuit breaker error.
New metrics:
warpstream_circuit_breaker_open
: Records the number of times the circuit breaker is opened, tagged by the circuit breaker name to identify the associated operations.tags:
name
warpstream_circuit_breaker_close
: Records the number of times the circuit breaker is closed, tagged by the circuit breaker name to identify the associated operations.tags:
name
warpstream_circuit_breaker_halfopen
: Records the number of times the circuit breaker is half-opened, tagged by the circuit breaker name to identify the associated operations.tags:
name
warpstream_circuit_breaker_error
: Records the number of times the circuit breaker blocks an operation, tagged by the circuit breaker name to identify the associated operations.tags:
name
Release v555
May 8th, 2024
Agent
New metrics:
warpstream_agent_kafka_fetch_uncompressed_bytes_counter
: Tracks the count of uncompressed bytes fetched. Although a histogram version (warpstream_agent_kafka_fetch_uncompressed_bytes
) already exists, it may not provide an accurate count in some metrics systems. That's why we've made this data directly available as a counter.tags:
topic
(requires enabling high-cardinality metrics)
agent_kafka_produce_uncompressed_bytes_counter
: Tracks the count of uncompressed bytes produced. While the histogram version (warpstream_agent_kafka_produce_uncompressed_bytes
) is available, some metrics systems might not accurately count it. So, we're also offering this data directly as a counter.tags:
topic
(requires enabling high-cardinality metrics)
Release v554
April 30th, 2024
Agent
Replaces all instances of Ristretto cache with a simple LRU cache, dramatically reduces the number of caches misses in topic metadata related caches.
Release v553
April 30th, 2024
Agent
Rename metrics:
warpstream_agent_kafka_inflight_conn
has been renamed towarpstream_agent_kafka_inflight_connections
warpstream_agent_kafka_inflight_request
has been renamed towarpstream_agent_kafka_inflight_requests
Release v552
April 29th, 2024
Agent
Dramatically improve the performance of fetch requests that query 100s or 1000s of partitions in a single fetch request.
Add speculative retries for blob storage reads in the fetch path (in addition to existing speculative retries for blob storage writes in the write path).
Release v551
April 26th, 2024
Agent
Convert an error log (that did not actually represent an error) to an info log to reduce error spam.
Release v550
April 23nd, 2024
Agent
High-Cardinality Metrics: Introduced a new agent configuration for enabling high-cardinality metrics. To activate this feature, use the command-line flag
-kafkaHighCardinalityMetrics
or set the environment variableWARPSTREAM_KAFKA_HIGH_CARDINALITY_METRICS=true
.Deprecated metrics:
warpstream_agent_kafka_fetch_bytes_sent
warpstream_agent_segment_batcher_flush_file_size_counter_type
warpstream_agent_segment_batcher_flush_file_size
blob_store_list_latency
blob_store_list_count
blob_store_delete_latency
blob_store_delete_count
blob_store_put_bytes_latency
blob_store_put_bytes_count
blob_store_put_stream_latency
blob_store_put_stream_count
blob_store_get_bytes_latency
blob_store_get_bytes_count
blob_store_get_stream_latency
blob_store_get_stream_count
blob_store_get_bytes_range_latency
blob_store_get_bytes_range_count
blob_store_get_stream_range_latency
blob_store_get_stream_range_count
New metrics:
warpstream_agent_kafka_fetch_uncompressed_bytes
: Tracks the total uncompressed bytes fetched, replacingwarpstream_agent_kafka_fetch_bytes_sent
.tags:
topic
(requires enabling high-cardinality metrics)
warpstream_agent_kafka_produce_uncompressed_bytes
: Tracks the number of uncompressed bytes produced.tags:
topic
(requires enabling high-cardinality metrics)
warpstream_agent_segment_batcher_flush_file_size_uncompressed_bytes
: Tracks the uncompressed size of files stored after batching, serving as a replacement forwarpstream_agent_segment_batcher_flush_file_size
.warpstream_blob_store_operation_latency
: Tracks the latency and count of individual object storage operations. It's a replacement for all deprecatedblob_store_
metrics. Now all the different operations are tagged within the same metric.tags:
operation,outcome
Release v549
April 22nd, 2024
Agent
Adds support for authenticating Kafka clients using mTLS. A
-requireMTLSAuthentication
flag is added, and the previous-tlsVerifyClientCert
flag has been deprecated. A new-requireSASLAuthentication
flag is added, and the previous-requireAuthentication
flag is deprecated. When authenticating using mTLS, the Distinguished Name(DN) from the client certificate is used as the Kafka Principal. The-tlsPrincipalMappingRule
flag can be used to specify a Regex to extract a principal from the DN. For example, the ruleCN=([^,]+)
will extract the Common Name(CN) from the DN, and use that as the ACL principal.
Release v548
April 17th, 2024
Agent
Fix a rare bug where some compactions could fail if you had tombstone expiration enabled and very little data to compact.
Release v547
April 11nd, 2024
Agent
Fix playground mode against latest control plane signup constraints
Release v545
April 4nd, 2024
Agent
Added support to new regions. There is a new "-region" (or
WARPSTREAM_REGION
env variable) that you can leverage to pick among the supported regions - you can use the console to see the current list.
Release v544
April 2nd, 2024
Agent
Added support for TLS/mTLS for Warpstream Agent and kafka client connections. The
-kafkaTLS
(envWARPSTREAM_TLS_ENABLED=true
),-tlsServerCertFile
(envWARPSTREAM_TLS_SERVER_CERT_FILE=<filepath>
),tlsServerPrivateKeyFile
(envWARPSTREAM_TLS_SERVER_PRIVATE_KEY_FILE=<filepath>
),-tlsVerifyClientCert
(envWARPSTREAM_TLS_VERIFY_CLIENT_CERT=true
),-tlsClientCACertFile
(envWARPSTREAM_TLS_CLIENT_CA_CERT_FILE=<filepath>
) flags were added which respectively are used to enable tls, pass the TLS server certificate to the Agent, pass the TLS server private key to the Agent, optionally enable client certificate verification, and to optionally pass the TLS root certificate authority certificate file to the server.
Release v543
March 29th, 2024
Agent
Adding a new "availabilityZoneRequired" (or env variable "WARPSTREAM_AVAILABILITY_ZONE_REQUIRED") flag. When enabled, the agent will synchronously try to resolve its availability zone during startup for 1 min, and will not start serving its /v1/status health check until it succeeds. The process will exit early if it did not manage to resolve the availability zone.
Release v542
March 26th, 2024
Agent
Fixed a race condition during startup that could make some agents advertise their availability zone as "WARPSTREAM_UNSET_AZ"
Release v541
March 25th, 2024
Agent
Added beta support for benthos in the WarpStream Agents
Release v539
March 15th, 2024
Agent
Adding a new "agentGroup" parameter to define the name of the 'group' that the Agent belongs to. This feature is used to isolate groups of Agents that belong to the same logical cluster, but should not communicate with each other because they're deployed in separate cloud accounts, vpcs, or regions. By default the agent belongs to the default group.
Release v538
March 6th, 2024
Agent
Fixed a rare panic in the Agents caught by Antithesis
Release v537
March 1st, 2024
Agent
Kafka "compacted topics" are generally available.
Release v536
February 28th, 2024
Agent
Fix a bug with the
warpstream playground
command, introduced in v535.
Release v535
February 26th, 2024
Agent
Make agent pool name argument completely optional, even when using non-default clusters.
Fixed a memory leak that would cause some workloads to use an excessive amount of memory over time.
Add support for S3 express and separating the "ingestion" bucket from the "compaction" bucket so data can be landed into low-latency storage and then immediately compacted into lower cost storage.
Add speculative retries to file flushing which dramatically reduces outlier latency for Produce requests.
Release v534
Agent
Bug fixes
Fixed another bug in the fetch logic that would result in the Agent returning empty batches in some scenarios when transient network errors occurred. This did not cause any correctness issues, but would make librdkafka refuse to proceed in some scenarios and block consumption of some partitions.
Release v533
Agent
Bug fixes
Fixed a bug in the fetch logic that would result in the Agent returning empty batches in some scenarios depending on the client configuration. This did not cause any correctness issues, but would make librdkafka refuse to proceed in some scenarios and block consumption of some partitions.
Release v532
Agent
New Features
Improves roles reporting to Warpstream control plane so that they can be properly rendered in our console.
Release v531
Agent
New Features
Supports automatic availability zone detection in kubernetes reading the node zone label (requires version
0.10.0
of our helm charts).
Release v530
Agent
New Features
Compatibility with CreatePartitions Kafka API for updating partitions count.
Release v529
Agent
New Features
Treat grade nat as private IP addresses, and allow advertising it.
Performance improvements
Improve caching in the agent.
Improve batching of requests to the warpstream metadata backend.
Other general performance improvements.
Release v526
Agent
New Features
Full-support for ACLs using SASL credentials.
Release v525
Agent
Bug Fixes and performance Improvements
Improved object pooling in RPC layer to reduce memory usage.
Add transparent batching to the file cache RPCs to dramatically reduce the number of inter-agent RPCs for high partition workloads.
Enhanced Apache Kafka compatibility by refining the handling of watermarks in Fetch responses.
Release v524
Agent
Dynamic Consumer Group Rebalance Timeout
Partially handle consumer group requests (JoinGroup and SyncGroup) in the agent, to use the clients' rebalance timeout, instead of the previous 10s default. This enhancement will minimize unnecessary rebalance attempts in larger consumer groups with numerous members, due to short timeouts.
Release v523
Agent
Bug Fixes and performance Improvements
Fixed a bug in the Fetch() code that was not setting the correct topic ID in error responses which made some Kafka clients emit warning logs when this happened.
Fixed a bug in the "roles" feature that was causing Agents with the "produce" role to still participate in the distributed file cache. Now only Agents with the "consume" role will participate in the file cache, as expected.
Release v522
Agent
Bug Fixes and performance Improvements
Circuit breakers will now return example errors for clarity.
Fetch() code path will now handle failures more gracefully by returning incremental results in more scenarios which improves the system's ability to recover under load.
Fix a memory leak in the in-memory file cache implementation.
Release v521
Agent
New Features
Docker images are now multi-arch, our documentation and official kubernetes charts has been updated accordingly.
Introduced circuit breakers around object storage access.
Finer control over agent roles: it is now possible to split between the
proxy-consume
andproxy-produce
roles, our documentation has been updated as well.
Release v520
Agent
This release is the first phase of a two-phase upgrade to WarpStream's internal file format. This release adds support for reading the upgraded file format. You MUST upgrade all Agents to this version before moving from any version < v520 to any version > than v520.
Release v518
Agent
New Features
Support kafka Headers: if you produce messages containing Kafka headers, they will now be automatically persisted to your cloud object storage, and will be read when fetching.
Revisit the flags and configuration knobs to choose how the agents advertise themselves in Warpstream service discovery. Our documentation has been updated accordingly.
Agent nodes can now be configured to run dedicated roles - see splitting roles documentation.
Control Plane
New Features
Fully support kafka
ListOffsets
protocol: you can now look for partition offsets based on timestamps.
Release v517
Agent
Bug Fixes and performance Improvements
Fixed a bug related to the handling of empty (but not null) values in records in the Fetch implementation.
Release v516
Agent
New Features
The agent will now report a sample of its error logs back to Warpstream control plane. It should ease troubleshooting and help us identify issues earlier. This can be disabled with the flag
disableLogsCollection
or the environment variableWARPSTREAM_DISABLE_LOGS_COLLECTION
.
Bug Fixes and performance Improvements
Added batching in the metadata calls made during Kafka
Fetch
, improving memory usage along the way.
Control Plane
New Features
Added support for Kafka's
InitProducerID
protocol message, and the idempotent producer functionality in general. Requires upgrading to a version of the Agents that is >=v515
.Added support for Kafka
ListOffsets
with positive timestamps value (until now only negative values for special cases were supported)
Release v515
Agent
New Features
The Agents will now report the lag / max offsets for every active consumer group as standard metrics. The metrics can be found as
warpstream_consumer_group_lag
andwarpstream_consumer_group_max_offset
.The Agents will now report the number of files at each compaction level so that user's can monitor whether they are experiencing compaction lag. These metrics can be found as
warpstream_files_count
and the level is tagged with the namecompaction_level
.
Bug Fixes and performance Improvements
File cache is now partitioned by
<file_id, 16MiB extent>
instead of just<file_id>
. This spreads the load for fetching data for large files more evenly amongst all the Agents.Added some logic in the file cache to detect when certain parts of the cache are experiencing high churn and reduce the default IO size for paging in data from object storage. This helps avoid filling the cache with data that won't be read.
Fixed a bug in the file cache that was causing it to significantly *over* fetch data in some scenarios. This did not cause any correctness problems, but it wasted network bandwidth and CPU cycles.
Modified the implementation of the Kafka
Fetch
method to return incremental results when it experiences a retryable error mid-fetch. This makes the Agents much better at recovering from disruption and catching consumer lag incrementally.Added some pre-fetching logic into the Kafka
Fetch
method so that when data for a single partition is spread amongst many files the Agent doesn't get bottlenecked making many single-threaded RPCs. This mostly helps increase the speed at which individual partitions can be "caught up" when lagging.Increased the default maximum file size created at ingestion time from 4MiB to 8MiB. This improves performance for extremely high volume workloads.
Added replication to the Agent file cache so that if an error is experienced trying to load data from the file cache on the Agent node that is "responsible" for a chunk of data, the client can retry on a different node. This helps minimize disruption when Agents shutdown ungracefully.
Agents now report their CPU utilization to the control plane. We will use this information in the future to improve load balancing decisions. CPU utilization can be view in the WarpStream Admin console now as well.
Improved the performance of deserializing file footers.
Standardized prometheus metric names prefixes.
Added a lot more metrics and instrumentation, especially around the blob storage library and file cache.
Control Plane
New Features
Added support for the Kafka protocol message
DeleteTopics
.
Bug Fixes and Performance Improvements
We added intelligent throttling / scheduling to the deadscanner scheduler. This scheduler is responsible for scheduling jobs that run in the Agent to scan for "dead files" in object storage and delete them. Previously these jobs could run with high frequency and rates which would interfere with live workloads. In addition, they could also result in very high object storage API requests costs due to excessive amounts of
LIST
requests. The new implementation is much more intelligent and automatically tunes the frequency to avoid disrupting the live workload and incurring high API request fees.
Last updated