Hive Metadata Store

Tableflow can automatically register tables in a Hive Metastore and update a table's metadata location to point to the latest snapshot. Tables are registered as external Iceberg tables using the Hive Metastore Thrift protocol.

Prerequisites

In order for this to work, the WarpStream Agents need to be upgraded to at least version v769. Additionally, the WarpStream Agents must have network access to the Hive Metastore Thrift endpoint.

circle-info

Authentication: Only NOSASL authentication is currently supported. Kerberos and other authentication mechanisms are not yet available. If you need Kerberos support, please reach out to us.

1. Ensure Network Connectivity

The WarpStream Agents must be able to reach the Hive Metastore Thrift endpoint over the network. The default Hive Metastore Thrift port is 9083.

2. Add Table Configuration

Add the following Hive Metastore configuration to your table config:

tables:
    - source_topic: "example_topic"
      ...
      hive_table_config:
        enabled: true
        thrift_uri: "thrift://<hive-metastore-host>:9083"
        namespace: "<hive-database-name>"
        table_name: "<hive-table-name>"
      schema:
      ...

Required Parameters

Field

Description

enabled

Set to true to enable Hive Metastore sync for this table.

thrift_uri

The Thrift URI of the Hive Metastore. Must use the thrift:// scheme (e.g., thrift://hive-metastore.example.com:9083). If the port is omitted, 9083 is used by default.

namespace

The Hive database (namespace) where the table will be created. If the database does not exist, Tableflow will create it automatically.

table_name

The name of the table to create or update in the Hive Metastore. This can be different from the name of the table in Tableflow.

3. Query the Data

Once enabled, your tables will appear in the Hive Metastore as external Iceberg tables. You can query them from any query engine that supports the Hive Metastore catalog, such as Trino, Spark, or Presto. For example, using Trino:

Or using Spark SQL:

How It Works

When the Hive Metastore integration is enabled for a table, Tableflow will:

  1. Create the namespace if it does not exist in the Hive Metastore.

  2. Create the table as an external Iceberg table if it does not exist, setting the metadata_location to the latest Iceberg metadata file in your object storage bucket.

  3. Update the table on subsequent syncs by updating metadata_location to point to the latest snapshot. The previous metadata location is preserved in the previous_metadata_location table property.

Last updated

Was this helpful?