Databricks
This pages describes how to integrate Tableflow with Databricks so that you can query Iceberg tables created by WarpStream in Databricks.
Integration Context & Limitations
Databricks Unity Catalog currently supports a limited set of external Iceberg REST catalogs, as the only way to access Iceberg external tables. While it supports several catalog providers, this guide focuses exclusively on AWS Glue as the integration path.
To enable access to your data, we use AWS Glue as an intermediate catalog. Your table metadata is synced to AWS Glue, which is then mounted into Databricks as a Foreign Catalog.
Schema Limitations & Workarounds
Unity Catalog has a known limitation regarding Iceberg schemas: it does not support NOT NULL constraints nested within arrays or maps.
If your schema contains these fields, queries may fail with the error:
[DELTA_NESTED_NOT_NULL_CONSTRAINT] Delta does not support NOT NULL constraints nested within arrays or maps.
Workarounds:
Option A: Modify Schema
Update your Iceberg schema to make nested fields optional (nullable). This allows the table to be queried using standard Databricks "SQL Warehouse" compute.
Option B: Use Cluster Compute
If you cannot modify the schema, you must use "Cluster Compute" (SQL Warehouses are not supported for this config) and enable the following Spark configuration to suppress the error:
spark.databricks.delta.constraints.allowUnenforcedNotNull.enabled = trueIntegrate via AWS Glue
This guide explains how to query WarpStream Tableflow tables in Databricks.
0. Set Up AWS Glue Integration
First thing, before setting up Databricks integration, you must follow the steps at AWS Glue to have your Warpstream Tableflow available in AWS Glue.
1. Set Up AWS Authentication
First, establish the required AWS authentication configurations (IAM Role) and grant Databricks access to them (Storage Credential).
Instructions: Follow the Databricks Guide: Create IAM role and credential.
2. Create the Glue Connection
Next, create a connection object within Databricks to link to your AWS Glue environment.
Instructions: Follow the Databricks Guide: Create the connection.
3. Create External Location for the Bucket
You must create an external location for the specific S3 bucket where your Tableflow data resides to authorize read access.
Instructions: Follow the Databricks Guide: Create external locations.
4. Create the Foreign Catalog
Finally, create the foreign catalog itself to mount the Glue database.
Instructions: Follow the Databricks Guide: Create a foreign catalog.
Important: When creating the catalog, you must set the storage location option to your S3 bucket path. If this is omitted, Databricks will fail to read the Iceberg data.
5. Query the Data
Once the catalog is created, your WarpStream tables will automatically appear in the Databricks UI (Catalog Explorer). You can now query them using standard SQL, Notebooks, or BI tools just like any other native table.
To reference a table in your queries, use the full three-level namespace:
Last updated
Was this helpful?