# BigQuery

{% hint style="warning" %}
**For new GCP deployments, we recommend using** [**BigLake**](https://docs.warpstream.com/warpstream/tableflow/catalogs-and-query-engines/biglake) **instead.** BigLake registers tables in the BigLake Metastore catalog, making them automatically available in BigQuery and in any other query engine that supports the Iceberg REST Catalog protocol.
{% endhint %}

Tableflow can automatically register tables in BigQuery and update a table's metadata location to point to the latest snapshot.

### Prerequisites

In order for this to work, the WarpStream Agents need to be upgraded to at least **v737**.

## 1. Create the BigQuery Dataset

Create a BigQuery dataset to hold your Tableflow tables. The dataset must exist before enabling the integration.

```bash
bq mk --dataset --location=<gcs_bucket_region> <project_id>:<dataset_id>
```

{% hint style="warning" %}
**Critical Requirement:** The BigQuery Dataset location must match your GCS bucket region.

* If your bucket is in `us-east1`, your dataset must be in `us-east1`.
* If they do not match, BigQuery will be unable to read the data files.
  {% endhint %}

## 2. Grant IAM Permissions

The Tableflow agent service account requires the following roles:

| Role                         | Purpose                           |
| ---------------------------- | --------------------------------- |
| `roles/bigquery.dataEditor`  | Create and update external tables |
| `roles/storage.objectViewer` | Read Iceberg metadata from GCS    |

Grant them via:

```bash
# 1. Grant BigQuery access
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/bigquery.dataEditor"

# 2. Grant GCS access (required for schema detection)
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/storage.objectViewer"
```

## 3. Add Table Configuration

Add the following BigQuery configuration to your table config:

```yaml
# Global defaults for all BigQuery tables
bigquery_defaults: 
    project_id: <project_id>
    dataset_id: <dataset_id>
tables:
    # Example Table: 'events'
    - source_topic: events
    # ... other table settings ...
      bigquery_table_config: 
        enabled: true 
        table_id: "events"
```

#### Top-Level Defaults (`bigquery_defaults`)

These defaults apply to all tables unless overridden per-table.

| Field        | Description                                          |
| ------------ | ---------------------------------------------------- |
| `project_id` | The GCP project ID containing the BigQuery dataset   |
| `dataset_id` | The BigQuery dataset ID where tables will be created |

#### Per-Table Configuration (`bigquery_table_config`)

| Field        | Description                                          |
| ------------ | ---------------------------------------------------- |
| `enabled`    | Set to `true` to enable BigQuery sync for this table |
| `table_id`   | The BigQuery table name to create/update             |
| `project_id` | Override the default `project_id` for this table     |
| `dataset_id` | Override the default `dataset_id` for this table     |

## 4. Query the Data

Once enabled, your tables will appear in the BigQuery console. Query them using standard SQL:

```sql
SELECT * FROM <project_id>.<dataset_id>.<table_id> LIMIT 100;
```

{% hint style="warning" %}
To write efficient queries on partitioned tables please read <https://docs.cloud.google.com/bigquery/docs/querying-partitioned-tables>.
{% endhint %}
