LogoLogo
WarpStream.comSlackDiscordContact UsCreate Account
  • Overview
    • Introduction
    • Architecture
      • Service Discovery
      • Write Path
      • Read Path
      • Life of a Request (Simplified)
    • Change Log
  • Getting Started
    • Install the WarpStream Agent / CLI
    • Run the Demo
    • "Hello World" for Apache Kafka
  • BYOC
    • Run the Agents Locally
    • Deploy the Agents
      • Object Storage Configuration
      • Kubernetes Known Issues
      • Rolling Restarts and Upgrades
    • Infrastructure as Code
      • Terraform Provider
      • Helm charts
      • Terraform Modules
    • Monitoring
      • Pre-made Datadog Dashboard
      • Pre-made Grafana Dashboard
      • Important Metrics and Logs
      • Recommended List of Alerts
      • Monitoring Consumer Groups
      • Hosted Prometheus Endpoint
    • Client Configuration
      • Tuning for Performance
      • Configure Clients to Eliminate AZ Networking Costs
        • Force Interzone Load Balancing
      • Configuring Kafka Client ID Features
      • Known Issues
    • Authentication
      • SASL Authentication
      • Mutual TLS (mTLS)
      • Basic Authentication
    • Advanced Agent Deployment Options
      • Agent Roles
      • Agent Groups
      • Protect Data in Motion with TLS Encryption
      • Low Latency Clusters
      • Network Architecture Considerations
      • Agent Configuration Reference
      • Reducing Infrastructure Costs
      • Client Configuration Auto-tuning
    • Hosted Metadata Endpoint
    • Managed Data Pipelines
      • Cookbooks
    • Schema Registry
      • WarpStream BYOC Schema Registry
      • Schema Validation
      • WarpStream Schema Linking
    • Port Forwarding (K8s)
    • Orbit
    • Enable SAML Single Sign-on (SSO)
    • Trusted Domains
    • Diagnostics
      • GoMaxProcs
      • Small Files
  • Reference
    • ACLs
    • Billing
      • Direct billing
      • AWS Marketplace
    • Benchmarking
    • Compression
    • Protocol and Feature Support
      • Kafka vs WarpStream Configuration Reference
      • Compacted topics
    • Secrets Overview
    • Security and Privacy Considerations
    • API Reference
      • API Keys
        • Create
        • Delete
        • List
      • Virtual Clusters
        • Create
        • Delete
        • Describe
        • List
        • DescribeConfiguration
        • UpdateConfiguration
      • Virtual Clusters Credentials
        • Create
        • Delete
        • List
      • Monitoring
        • Describe All Consumer Groups
      • Pipelines
        • List Pipelines
        • Create Pipeline
        • Delete Pipeline
        • Describe Pipeline
        • Create Pipeline Configuration
        • Change Pipeline State
      • Invoices
        • Get Pending Invoice
        • Get Past Invoice
    • CLI Reference
      • warpstream agent
      • warpstream demo
      • warpstream cli
      • warpstream cli-beta
        • benchmark-consumer
        • benchmark-producer
        • console-consumer
        • console-producer
        • consumer-group-lag
        • diagnose-record
        • file-reader
        • file-scrubber
      • warpstream playground
    • Integrations
      • Arroyo
      • AWS Lambda Triggers
      • ClickHouse
      • Debezium
      • Decodable
      • DeltaStream
      • docker-compose
      • DuckDB
      • ElastiFlow
      • Estuary
      • Fly.io
      • Imply
      • InfluxDB
      • Kestra
      • Materialize
      • MinIO
      • MirrorMaker
      • MotherDuck
      • Ockam
      • OpenTelemetry Collector
      • ParadeDB
      • Parquet
      • Quix Streams
      • Railway
      • Redpanda Console
      • RisingWave
      • Rockset
      • ShadowTraffic
      • SQLite
      • Streambased
      • Streamlit
      • Timeplus
      • Tinybird
      • Upsolver
    • Partitions Auto-Scaler (beta)
    • Serverless Clusters
Powered by GitBook
On this page
  • Introduction
  • Prerequisites
  • Step 1: Get your WarpStream credentials
  • Step 2: Prepare your Kwack parameters
  • Step 3: Consuming with Kwack
  • Next Steps

Was this helpful?

  1. Reference
  2. Integrations

DuckDB

DuckDB is an open-source, column-oriented, relational database management system (RDBMS) designed for analytical processing and interactive querying.

Previousdocker-composeNextElastiFlow

Last updated 5 months ago

Was this helpful?

A video walkthrough can be found below:

Introduction

Prerequisites

  1. A Serverless WarpStream cluster is up and running with a populated topic.

Step 1: Get your WarpStream credentials

Save these values for the next step.

Step 2: Prepare your Kwack parameters

Kwack can accept all the connection information and even SQL queries on the command line with various switches. A more easily reproducible method is to use a "properties" file, such as the one below:

# Topics to manage
topics=topic1

# Key serdes (default is binary)
key.serdes=topic1=string

# Value serdes (default is latest)
value.serdes=topic1=json:@/mypath/topic1_schema.json

# The bootstrap servers for your Kafka cluster
bootstrap.servers=<YOUR_BOOTSTRAP_BROKER>:<YOUR_PORT>
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='<YOUR_SASL_USERNAME>' password='<YOUR_SASL_PASSWORD>';

A schema registry or a local file can describe your data in various formats. For this example, we use a local schema definition in JSON format. Assuming a simple "customers" layout, the JSON schema would look something like the following:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "customerId": {
        "type": "string"
      },
      "name": {
        "type": "string"
      },
      "zone": {
        "type": "string"
      },
      "address": {
        "type": "string"
      },
      "membership": {
        "type": "string"
      }
    }
  }

If you have more than one topic to connect to, then those values are separated by commas as follows:

# Topics to manage
topics=topic1,topic2

# Key serdes (default is binary)
key.serdes=topic1=string,topic2=string

# Value serdes (default is latest)
value.serdes=topic1=json:@/mypath/topic1_schema.json,topic2=json:@/mypath/topic2_schema.json

# The bootstrap servers for your Kafka cluster
bootstrap.servers=<YOUR_BOOTSTRAP_BROKER>:<YOUR_PORT>
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='<YOUR_SASL_USERNAME>' password='<YOUR_SASL_PASSWORD>';

Step 3: Consuming with Kwack

Kwack can combine a mixture of run-time switches and a property file. To launch Kwack with a properties file, use the -F switch, such as:

kwack -F myconfig.properties

At this point, you can perform SQL commands against the active Kafka topics in WarpStream, including joining multiple topics for analytics. The topics can be persisted into a DuckDB database with the -d switch, such as:

kwack -F myconfig.properties -d mydb.duckdb

Next Steps

Congratulations! You can now read your WarpStream topics directly with Kwack and optionally save them as a DuckDB database. Kwack can also export your topics as Parquet files, among many other useful features.

There is no direct connection to DuckDB from any Apache Kafka-compliant service. However, a DuckDB plug-in named provides this ability. This guide will explain how to connect the two systems together to allow you to perform analytics on your WarpStream-managed Topics.

Have DuckDB .

Have a Kwack (requires Java 11 or higher).

WarpStream account - get access to WarpStream by registering .

Obtain the Bootstrap Broker from the WarpStream console by navigating to your cluster and clicking the Connect tab. If you don't have SASL credentials, you can also from the console.

Kwack
installed
installed
here
create a set of credentials
WarpStream Cluster Management