DuckDB
DuckDB is an open-source, column-oriented, relational database management system (RDBMS) designed for analytical processing and interactive querying.
Last updated
DuckDB is an open-source, column-oriented, relational database management system (RDBMS) designed for analytical processing and interactive querying.
Last updated
A video walkthrough can be found below:
There is no direct connection to DuckDB from any Apache Kafka-compliant service. However, a DuckDB plug-in named Kwack provides this ability. This guide will explain how to connect the two systems together to allow you to perform analytics on your WarpStream-managed Topics.
Have DuckDB installed.
Have a Kwack installed (requires Java 11 or higher).
WarpStream account - get access to WarpStream by registering here.
A Serverless WarpStream cluster is up and running with a populated topic.
Obtain the Bootstrap Broker from the WarpStream console by navigating to your cluster and clicking the Connect tab. If you don't have SASL credentials, you can also create a set of credentials from the console.
Save these values for the next step.
Kwack can accept all the connection information and even SQL queries on the command line with various switches. A more easily reproducible method is to use a "properties" file, such as the one below:
A schema registry or a local file can describe your data in various formats. For this example, we use a local schema definition in JSON format. Assuming a simple "customers" layout, the JSON schema would look something like the following:
If you have more than one topic to connect to, then those values are separated by commas as follows:
Kwack can combine a mixture of run-time switches and a property file. To launch Kwack with a properties file, use the -F switch, such as:
At this point, you can perform SQL commands against the active Kafka topics in WarpStream, including joining multiple topics for analytics. The topics can be persisted into a DuckDB database with the -d switch, such as:
Congratulations! You can now read your WarpStream topics directly with Kwack and optionally save them as a DuckDB database. Kwack can also export your topics as Parquet files, among many other useful features.