DuckDB
DuckDB is an open-source, column-oriented, relational database management system (RDBMS) designed for analytical processing and interactive querying.
A video walkthrough can be found below:
Introduction
There is no direct connection to DuckDB from any Apache Kafka-compliant service. However, a DuckDB plug-in named Kwack provides this ability. This guide will explain how to connect the two systems together to allow you to perform analytics on your WarpStream-managed Topics.
Prerequisites
Have DuckDB installed.
Have a Kwack installed (requires Java 11 or higher).
WarpStream account - get access to WarpStream by registering here.
A Serverless WarpStream cluster is up and running with a populated topic.
Step 1: Get your WarpStream credentials
Obtain the Bootstrap Broker from the WarpStream console by navigating to your cluster and clicking the Connect tab. If you don't have SASL credentials, you can also create a set of credentials from the console.
Save these values for the next step.
Step 2: Prepare your Kwack parameters
Kwack can accept all the connection information and even SQL queries on the command line with various switches. A more easily reproducible method is to use a "properties" file, such as the one below:
A schema registry or a local file can describe your data in various formats. For this example, we use a local schema definition in JSON format. Assuming a simple "customers" layout, the JSON schema would look something like the following:
If you have more than one topic to connect to, then those values are separated by commas as follows:
Step 3: Consuming with Kwack
Kwack can combine a mixture of run-time switches and a property file. To launch Kwack with a properties file, use the -F switch, such as:
At this point, you can perform SQL commands against the active Kafka topics in WarpStream, including joining multiple topics for analytics. The topics can be persisted into a DuckDB database with the -d switch, such as:
Next Steps
Congratulations! You can now read your WarpStream topics directly with Kwack and optionally save them as a DuckDB database. Kwack can also export your topics as Parquet files, among many other useful features.
Last updated