# BigQuery

{% hint style="warning" %}
**For new GCP deployments, we recommend using** [**BigLake**](/warpstream/tableflow/catalogs-and-query-engines/biglake.md) **instead.** BigLake registers tables in the BigLake Metastore catalog, making them automatically available in BigQuery and in any other query engine that supports the Iceberg REST Catalog protocol.
{% endhint %}

Tableflow can automatically register tables in BigQuery and update a table's metadata location to point to the latest snapshot.

### Prerequisites

In order for this to work, the WarpStream Agents need to be upgraded to at least **v737**.

## 1. Create the BigQuery Dataset

Create a BigQuery dataset to hold your Tableflow tables. The dataset must exist before enabling the integration.

```bash
bq mk --dataset --location=<gcs_bucket_region> <project_id>:<dataset_id>
```

{% hint style="warning" %}
**Critical Requirement:** The BigQuery Dataset location must match your GCS bucket region.

* If your bucket is in `us-east1`, your dataset must be in `us-east1`.
* If they do not match, BigQuery will be unable to read the data files.
  {% endhint %}

## 2. Grant IAM Permissions

The Tableflow agent service account requires the following roles:

| Role                         | Purpose                           |
| ---------------------------- | --------------------------------- |
| `roles/bigquery.dataEditor`  | Create and update external tables |
| `roles/storage.objectViewer` | Read Iceberg metadata from GCS    |

Grant them via:

```bash
# 1. Grant BigQuery access
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/bigquery.dataEditor"

# 2. Grant GCS access (required for schema detection)
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/storage.objectViewer"
```

## 3. Add Table Configuration

Add the following BigQuery configuration to your table config:

```yaml
# Global defaults for all BigQuery tables
bigquery_defaults: 
    project_id: <project_id>
    dataset_id: <dataset_id>
tables:
    # Example Table: 'events'
    - source_topic: events
    # ... other table settings ...
      bigquery_table_config: 
        enabled: true 
        table_id: "events"
```

#### Top-Level Defaults (`bigquery_defaults`)

These defaults apply to all tables unless overridden per-table.

| Field        | Description                                          |
| ------------ | ---------------------------------------------------- |
| `project_id` | The GCP project ID containing the BigQuery dataset   |
| `dataset_id` | The BigQuery dataset ID where tables will be created |

#### Per-Table Configuration (`bigquery_table_config`)

| Field        | Description                                          |
| ------------ | ---------------------------------------------------- |
| `enabled`    | Set to `true` to enable BigQuery sync for this table |
| `table_id`   | The BigQuery table name to create/update             |
| `project_id` | Override the default `project_id` for this table     |
| `dataset_id` | Override the default `dataset_id` for this table     |

## 4. Query the Data

Once enabled, your tables will appear in the BigQuery console. Query them using standard SQL:

```sql
SELECT * FROM <project_id>.<dataset_id>.<table_id> LIMIT 100;
```

{% hint style="warning" %}
To write efficient queries on partitioned tables please read <https://docs.cloud.google.com/bigquery/docs/querying-partitioned-tables>.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.warpstream.com/warpstream/tableflow/catalogs-and-query-engines/bigquery.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
