Object Storage Configuration
Bucket Configuration
Tableflow will manage the lifecycle of the objects in object storage, such as deleting compacted or expired data files and unneeded snapshots. As such, do not configure a retention policy on your bucket, and make sure that object versioning and object soft deletion are disabled. Removing files that Tableflow still considers live would make the Iceberg table unqueryable.
Bucket Permissions
The Tableflow Agent needs to have the appropriate permissions to interact with the bucket.
Specifically, the Agents need permission to perform the following operations:
PutObjectTo create data files and snapshots.
GetObjectTo read existing data files during compaction.
DeleteObjectTo enforce retention, clean up compacted files, and prune old snapshots.
ListBucketTo enforce retention, clean up compacted files, and prune old snapshots.
Below is an example Terraform configuration for an AWS IAM policy document that provides Tableflow with the appropriate permissions to access a S3 bucket:
data "aws_iam_policy_document" "warpstream_s3_policy_document" {
statement {
sid = "AllowS3"
effect = "Allow"
actions = [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
]
resources = [
"arn:aws:s3:::<my-bucket>/warpstream/_tableflow/*",
]
}
}The easiest way to configure bucket access in GCP is with the roles/storage.objectUser and roles/storage.bucketViewer roles like so:
resource "google_storage_bucket_iam_member" "warpstream_bucket_object_user" {
bucket = "<my-bucket>"
role = "roles/storage.objectUser"
member = "$PRINCIPAL"
}
resource "google_storage_bucket_iam_member" "warpstream_bucket_viewer" {
bucket = "<my-bucket>"
role = "roles/storage.bucketViewer"
member = "$PRINCIPAL"
}However, if you need more granular permission sets, then Tableflow requires the following:
storage.objects.createTo create data files and snapshots.
storage.objects.deleteTo enforce retention, clean up compacted files, and prune old snapshots.
storage.objects.getTo read existing data files during compaction.
storage.objects.listTo enforce retention, clean up compacted files, and prune old snapshots.
storage.multipartUploads.*To create data files and snapshots.
storage.buckets.getTo check if gRPC direct connectivity is available.
Last updated
Was this helpful?