# BigLake

Integrating with BigLake Metastore is the recommended way to make Tableflow Iceberg tables queryable in BigQuery. This approach uses the [BigLake Metastore Iceberg REST Catalog](https://docs.cloud.google.com/biglake/docs/blms-rest-catalog) API to handle table registration, giving you native Iceberg support and keeping your catalog automatically in sync as new snapshots are taken.

{% hint style="success" %}
**BigLake vs BigQuery integration:** BigLake is preferred over the [BigQuery integration](https://docs.warpstream.com/warpstream/tableflow/catalogs-and-query-engines/bigquery) because tables are registered in BigLake Metastore, which exposes the standard Iceberg REST Catalog protocol. This means the tables are automatically queryable from BigQuery *and* from any engine that can connect to an Iceberg REST Catalog, such as Spark, Trino, and Presto. The legacy BigQuery integration only creates BigQuery external tables, so the tables are only visible to BigQuery.
{% endhint %}

## Prerequisites

1. Please upgrade your WarpStream Agents to at least **v769**.
2. You need a [BigLake Metastore Iceberg REST catalog](https://docs.cloud.google.com/biglake/docs/blms-rest-catalog) created in the same region as your GCS bucket.

### Create a BigLake Metastore Catalog

If you don't already have one, create a BigLake Metastore Iceberg REST catalog:

```bash
gcloud beta biglake iceberg catalogs create \
  <GCS_BUCKET_NAME> \
  --project <PROJECT_ID> \
  --catalog-type gcs-bucket
```

{% hint style="info" %}
The catalog name must match the GCS bucket name (e.g., if your bucket is `gs://my-bucket`, use `my-bucket` as the catalog name). This is a BigLake requirement for `gcs-bucket` type catalogs. See the [Google Cloud documentation](https://docs.cloud.google.com/biglake/docs/blms-rest-catalog#create_a_catalog) for details.
{% endhint %}

{% hint style="info" %}
You do not need to manually create a BigQuery dataset. Tableflow automatically creates the namespace via the BigLake REST Catalog API.
{% endhint %}

## 1. Authentication and IAM Permissions

The BigLake integration uses [Google Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials). The agent authenticates to both the BigLake REST Catalog API and GCS using the service account it runs as — no additional credential configuration is needed.

The agent service account requires the following role:

| Role                   | Purpose                                           |
| ---------------------- | ------------------------------------------------- |
| `roles/biglake.editor` | Create and update tables in the BigLake Metastore |

Grant it via:

```bash
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
    --role="roles/biglake.editor"
```

{% hint style="info" %}
The agent already has GCS access for reading and writing Iceberg data. The only additional permission needed for BigLake is `roles/biglake.editor`.
{% endhint %}

## 2. Add Table Configuration

Add the following `biglake_table_config` block to each table you would like to sync:

```yaml
tables:
    - source_topic: events
      # ... other table settings ...
      biglake_table_config:
        enabled: true
        project_id: "<PROJECT_ID>"
        namespace: "<NAMESPACE>"
        table_name: "<TABLE_NAME>"
```

### Configuration Fields

| Field        | Required | Description                                                                                          |
| ------------ | -------- | ---------------------------------------------------------------------------------------------------- |
| `enabled`    | Yes      | Set to `true` to enable BigLake sync for this table                                                  |
| `project_id` | Yes      | The GCP project ID (used for billing/quota attribution via `x-goog-user-project`)                    |
| `namespace`  | Yes      | The BigLake namespace where the table will be registered. Created automatically if it doesn't exist. |
| `table_name` | Yes      | The table name to create/update in the BigLake catalog                                               |

{% hint style="info" %}
The agent automatically discovers the BigLake catalog using the destination GCS bucket. No catalog resource path is needed in the configuration.
{% endhint %}

{% hint style="info" %}
Tableflow automatically creates the namespace if it doesn't exist.
{% endhint %}

## 3. Query the Data

Once enabled, Tableflow will automatically register the table in BigLake and keep the metadata location up to date.

### From BigQuery

BigLake tables are queryable from BigQuery using the 4-part `Project.Catalog.Namespace.Table` syntax. The catalog name is the GCS bucket name (without the `gs://` prefix):

```sql
SELECT * FROM `<PROJECT_ID>.<GCS_BUCKET_NAME>.<NAMESPACE>.<TABLE_NAME>` LIMIT 100;
```

For example, if your project is `my-project`, your bucket is `gs://my-bucket`, and you configured namespace `my_ns` with table `my_table`:

```sql
SELECT * FROM `my-project.my-bucket.my_ns.my_table` LIMIT 100;
```

### From Other Query Engines

Any query engine that supports the Iceberg REST Catalog protocol (e.g. Spark, Trino, Presto) can connect directly to the BigLake Metastore REST Catalog endpoint and query the tables.
