Security and Privacy Considerations
This page describes the various security and privacy considerations for WarpStream's BYOC deployment model.
Overview
WarpStream's BYOC products address security and privacy considerations by ensuring that raw data written to your data plane clusters never leaves your VPC or object storage buckets. WarpStream does not need access to the raw data - with regard to personal information that WarpStream processes in your data plane, you receive a sovereign solution.
WarpStream receives metadata about your workloads (described in more detail in the sections below). WarpStream ensures that metadata for your workloads is stored only in the control plane region that you select when your WarpStream cluster is created, and never replicated or stored in any other region unless you explicitly opt-in to a multi-region control plane or transfer to another control plane.
In the interest of transparency, WarpStream maintains a compliance portal that includes information about our security and compliance practices, including certification reports and detailed information regarding the controls that we have implemented.
In addition to following the best practices and controls documented on our compliance portal, WarpStream also supports Kafka ACLs, as well as SASL/PLAIN and SASL/SCRAM-SHA-512 authentication.
Bring Your Own Cloud (BYOC) clusters
Raw data written to WarpStream clusters never leaves your VPC or object storage buckets. The only data that ever leaves your VPC is metadata about your Kafka workloads that is required for the correct functioning of your clusters, which includes the following:
Topic names
Topic metadata (partition counts, configuration, etc)
File metadata (object store bucket name, compressed size, uncompressed size, etc)
Record timestamps and offsets (but never record keys or record contents)
Consumer group names, configuration and offsets
Kafka client IDs
Producer IDs, epochs, and sequence numbers
WarpStream Schema Registry clusters
Raw schemas register in the BYOC schema registry never leave your VPC or object storage buckets. The only data that leaves your VPC is metadata about your schemas that are necessary for the correct functioning of your schema registry clusters, which includes the following:
Schema metadata: schema data format, schema ID
Schema subject names
Schema subject metadata: schema context name, compatibility rule, subject version, schema ID, soft deleted
File metadata: object store bucket name, schema size
Schema reference metadata: subject, subject version
Schema context name
Global configuration: default compatibility rule
WarpStream Tableflow
WarpStream Tableflow stores the table's schema and metadata for every file in your data lake in the control plane. This metadata includes:
Bucket name where files are stored
Fully-qualified path to each file
File size, number of rows, and similar numeric metrics.
On an opt-in basis for specific columns, the minimum and maximum value for that column in each file. This is an optional part of the Iceberg and Delta Lake table formats, but it accelerates many kinds of queries against your table. Tables are commonly sorted by timestamp, or some kind of opaque internal identifier, depending on your query patterns.
For partitioned tables, on an opt-in basis, the partition values. Most tables are partitioned by date, so this is usually not sensitive information.
Tableflow also stores some metadata about the source Kafka cluster:
Topic names
Minimum and maximum offset for each partition
Source data format (Avro, JSON, etc.)
Additionally, Tableflow depends on and stores the Agent Cluster Metadata.
Agent Cluster Metadata
Agent Metadata (stored ephemerally in memory, never persisted to disk)
Number of connections (for load balancing)
Number of vCPUs (for determining how many concurrent jobs it can run) and utilization.
Internal / Private IP addresses. These addresses are not routable from the internet, and are required so that the Agents can cluster with each other within a single availability zone.
Availability zone.
A small sample of the Agent's logs so that we can help diagnose and debug issues remotely. This can be disabled by setting the
-disableLogsCollection
flag orWARPSTREAM_DISABLE_LOGS_COLLECTION=true
environment variable. These logs never contain raw data, and only contain things like error messages or high level statistics.The Agent's profiling data so that we can investigate performance degradations remotely. This can be disabled with the
-disableProfileForwarding
flag or theWARPSTREAM_DISABLE_PROFILE_FORWARDING
environment variable. These profiles only contain information about program execution.
Last updated
Was this helpful?