Databricks

This pages describes how to integrate Tableflow with Databricks so that you can query Iceberg tables created by WarpStream in Databricks.

Integration Context & Limitations

Databricks Unity Catalog currently supports a limited set of external Iceberg REST catalogs, as the only way to access Iceberg external tables. While it supports several catalog providers, this guide focuses exclusively on AWS Glue as the integration path.

To enable access to your data, we use AWS Glue as an intermediate catalog. Your table metadata is synced to AWS Glue, which is then mounted into Databricks as a Foreign Catalog.

If your architecture requires connecting via a different supported catalog (e.g., syncing Tableflow to Snowflake and mounting that in Databricks), please reach out to us for assistance.

Schema Limitations & Workarounds

Unity Catalog has a known limitation regarding Iceberg schemas: it does not support NOT NULL constraints nested within arrays or maps.

If your schema contains these fields, queries may fail with the error:

[DELTA_NESTED_NOT_NULL_CONSTRAINT] Delta does not support NOT NULL constraints nested within arrays or maps.

Workarounds:

  • Option A: Modify Schema

    Update your Iceberg schema to make nested fields optional (nullable). This allows the table to be queried using standard Databricks "SQL Warehouse" compute.

  • Option B: Use Cluster Compute

    If you cannot modify the schema, you must use "Cluster Compute" (SQL Warehouses are not supported for this config) and enable the following Spark configuration to suppress the error:

spark.databricks.delta.constraints.allowUnenforcedNotNull.enabled = true

Integrate via AWS Glue

This guide explains how to query WarpStream Tableflow tables in Databricks.

0. Set Up AWS Glue Integration

First thing, before setting up Databricks integration, you must follow the steps at AWS Glue to have your Warpstream Tableflow available in AWS Glue.

1. Set Up AWS Authentication

First, establish the required AWS authentication configurations (IAM Role) and grant Databricks access to them (Storage Credential).

2. Create the Glue Connection

Next, create a connection object within Databricks to link to your AWS Glue environment.

3. Create External Location for the Bucket

You must create an external location for the specific S3 bucket where your Tableflow data resides to authorize read access.

4. Create the Foreign Catalog

Finally, create the foreign catalog itself to mount the Glue database.

  • Important: When creating the catalog, you must set the storage location option to your S3 bucket path. If this is omitted, Databricks will fail to read the Iceberg data.

5. Query the Data

Once the catalog is created, your WarpStream tables will automatically appear in the Databricks UI (Catalog Explorer). You can now query them using standard SQL, Notebooks, or BI tools just like any other native table.

To reference a table in your queries, use the full three-level namespace:

Last updated

Was this helpful?