BigQuery

This page describes how to integrate Tableflow with Google BigQuery so that you can query Iceberg tables created by WarpStream directly in BigQuery.

Tableflow can automatically register tables in BigQuery and update a table's metadata location to point to the latest snapshot.

Prerequisites

In order for this to work, the WarpStream Agents need to be upgraded to at least v737.

1. Create the BigQuery Dataset

Create a BigQuery dataset to hold your Tableflow tables. The dataset must exist before enabling the integration.

bq mk --dataset --location=<gcs_bucket_region> <project_id>:<dataset_id>

Important: The dataset location should match your GCS bucket region, it's strictly required to avoid errors.

2. Grant IAM Permissions

The Tableflow agent service account requires the following roles:

Role
Purpose

roles/bigquery.dataEditor

Create and update external tables

roles/storage.objectViewer

Read Iceberg metadata from GCS

Grant them via:

3. Add Table Configuration

Add the following BigQuery configuration to your table config:

Top-Level Defaults (bigquery_defaults)

These defaults apply to all tables unless overridden per-table.

Field
Description

project_id

The GCP project ID containing the BigQuery dataset

dataset_id

The BigQuery dataset ID where tables will be created

Per-Table Configuration (bigquery_table_config)

Field
Description

enabled

Set to true to enable BigQuery sync for this table

table_id

The BigQuery table name to create/update

project_id

Override the default project_id for this table

dataset_id

Override the default dataset_id for this table

4. Query the Data

Once enabled, your tables will appear in the BigQuery console. Query them using standard SQL:

Last updated

Was this helpful?