Configuring Object Storage Buckets and Permissions

This page describes how to configure object storage buckets and permissions for the WarpStream Agents.

We highly recommend running the WarpStream Agent with a dedicated bucket for isolation; however, the WarpStream Agent will only write/read data under the warpstream prefix.

The WarpStream Agent manages all data in the object storage warpstream directory. It is extremely important that you allow it to do so alone and never delete files from the warpstream directory manually. Manually deleting files in the warpstream directory will effectively "brick" a virtual cluster and require that it be recreated from scratch.

Bucket Configuration

The WarpStream bucket should not have a configured object retention policy. WarpStream will manage the lifecycle of the objects, including deleting objects that have been compacted or have expired due to retention. If you must configure a retention policy on your bucket, make sure it is significantly longer than the longest retention of any topic/stream in any of your Virtual Clusters to avoid data loss.

We recommend configuring a lifecycle policy for cleaning up aborted multi-part uploads. This will prevent failed file uploads from the WarpStream Agent from accumulating in the bucket forever and increasing your storage costs. Below is a sample Terraform configuration for a WarpStream S3 storage bucket:

resource "aws_s3_bucket" "warpstream_bucket" {
  bucket = "my-warpstream-bucket-123"

  tags = {
    Name        = "my-warpstream-bucket-123"
    Environment = "staging"
  }
}

resource "aws_s3_bucket_metric" "warpstream_bucket_metrics" {
 bucket = aws_s3_bucket.warpstream_bucket.id
 name   = "EntireBucket"
}

resource "aws_s3_bucket_lifecycle_configuration" "warpstream_bucket_lifecycle" {
  bucket = aws_s3_bucket.warpstream_bucket.id

  # Automatically cancel all multi-part uploads after 7d so we don't accumulate an infinite
  # number of partial uploads.
  rule {
    id     = "7d multi-part"
    status = "Enabled"
    abort_incomplete_multipart_upload {
      days_after_initiation = 7
    }
  }
  
  # No other lifecycle policy. The WarpStream Agent will automatically clean up and
  # deleted expired files.
}

Bucket Permissions

In addition to configuring the WarpStream buckets, you'll also need to make sure the Agent containers have the appropriate permissions to interact with the bucket. Specifically, the Agents need permission to perform the following operations:

  • PutObject

    • To create new files.

  • GetObject

    • To read existing files.

  • DeleteObject

    • So the Agents can enforce retention and cleanup of pre-compaction files.

  • ListBucket

    • So the Agents can enforce retention and cleanup of pre-compaction files.

Below is an example Terraform configuration for an AWS IAM policy document that provides WarpStream with the appropriate permissions to access a dedicated S3 bucket:

data "aws_iam_policy_document" "warpstream_s3_policy_document" {
  statement {
    sid     = "AllowS3"
    effect  = "Allow"
    actions = [
      "s3:PutObject",
      "s3:GetObject",
      "s3:DeleteObject",
      "s3:ListBucket"
    ]
    resources = [
      "arn:aws:s3:::my-warpstream-bucket-123",
      "arn:aws:s3:::my-warpstream-bucket-123/*"
    ]
  }
}

Last updated

Logo

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. Kinesis is a trademark of Amazon Web Services.