BigQuery

This page describes how to integrate Tableflow with Google BigQuery so that you can query Iceberg tables created by WarpStream directly in BigQuery.

Tableflow can automatically register tables in BigQuery and update a table's metadata location to point to the latest snapshot.

Prerequisites

In order for this to work, the WarpStream Agents need to be upgraded to at least v737.

1. Create the BigQuery Dataset

Create a BigQuery dataset to hold your Tableflow tables. The dataset must exist before enabling the integration.

bq mk --dataset --location=<gcs_bucket_region> <project_id>:<dataset_id>

Critical Requirement: The BigQuery Dataset location must match your GCS bucket region.

If your bucket is in us-east1, your dataset must be in us-east1.
If they do not match, BigQuery will be unable to read the data files.

2. Grant IAM Permissions

The Tableflow agent service account requires the following roles:

Role

Purpose

roles/bigquery.dataEditor

Create and update external tables

roles/storage.objectViewer

Read Iceberg metadata from GCS

Grant them via:

# 1. Grant BigQuery access
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/bigquery.dataEditor"

# 2. Grant GCS access (required for schema detection)
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/storage.objectViewer"

3. Add Table Configuration

Add the following BigQuery configuration to your table config:

# Global defaults for all BigQuery tables
bigquery_defaults: 
    project_id: <project_id>
    dataset_id: <dataset_id>
tables:
    # Example Table: 'events'
    - source_topic: events
    # ... other table settings ...
      bigquery_table_config: 
        enabled: true 
        table_id: "events"

Top-Level Defaults (`bigquery_defaults`)

These defaults apply to all tables unless overridden per-table.

Field

Description

project_id

The GCP project ID containing the BigQuery dataset

dataset_id

The BigQuery dataset ID where tables will be created

Per-Table Configuration (`bigquery_table_config`)

Field

Description

enabled

Set to true to enable BigQuery sync for this table

table_id

The BigQuery table name to create/update

project_id

Override the default project_id for this table

dataset_id

Override the default dataset_id for this table

4. Query the Data

Once enabled, your tables will appear in the BigQuery console. Query them using standard SQL:

SELECT * FROM <project_id>.<dataset_id>.<table_id> LIMIT 100;

To write efficient queries on partitioned tables please read https://docs.cloud.google.com/bigquery/docs/querying-partitioned-tables.

PreviousDatabricks Unity NextDuckDB

Last updated 1 day ago

Was this helpful?

Good afternoon

hashtagPrerequisites

hashtag1. Create the BigQuery Dataset

hashtag2. Grant IAM Permissions

hashtag3. Add Table Configuration

hashtagTop-Level Defaults (bigquery_defaults)

hashtagPer-Table Configuration (bigquery_table_config)

hashtag4. Query the Data