Arroyo

This page describes how to integrate WarpStream with Arroyo, a distributed stream processing engine written in Rust, that is designed to efficiently perform statement computations on streams of data.

A video walkthrough can be found below:

Prerequisites

  1. WarpStream account - get access to WarpStream by registering here.

  2. Arroyo is installed and running - instructions are found here.

  3. WarpStream cluster up and running.

Step 1: Create a topic in your WarpStream cluster

Obtain the Bootstrap Broker from the WarpStream console by navigating to your cluster and clicking the Connect tab. If you don't have SASL credentials yet, you can also create a set of credentials from the console.

Store these values for easy reference; they will be needed in Arroyo. If you are going to produce records to your topic from the command line, then export them in a terminal window:

export BOOTSTRAP_HOST=<YOUR_BOOTSTRAP_BROKER> \
SASL_USERNAME=<YOUR_SASL_USERNAME> \
SASL_PASSWORD=<YOUR_SASL_PASSWORD>;

Then, create a topic in the WarpStream console if you don't already have one.

Step 2: Produce some records

You can use the WarpStream CLI to produce messages to your topic if you don't already have an active topic to work with:

warpstream kcmd -bootstrap-host $BOOTSTRAP_HOST -tls -username $SASL_USERNAME -password $SASL_PASSWORD -type produce -topic arroyo_demo --records '{"action": "click", "user_id": "user_0", "page_id": "home"},,{"action": "hover", "user_id": "user_0", "page_id": "home"},,{"action": "scroll", "user_id": "user_0", "page_id": "home"},,{"action": "click", "user_id": "user_1", "page_id": "home"},,{"action": "click", "user_id": "user_1", "page_id": "home"},,{"action": "click", "user_id": "user_2", "page_id": "home"}'

Note that the WarpStream CLI uses double commas (,,) as a delimiter between JSON records.

Step 3: Launch and connect Arroyo

Launch Arroyo with arroyo cluster and then open the Arroyo Web UI at http://localhost:5115/. which will present the following screen, and you'll want to click on "Connections" and then click on "Create Connection":

Select "Kafka" from the list of available connections:

Now file in the connection information as shown below:

Click on the "Validate" button, and if there are no errors, the button will change to "Create", which you will then click and be presented with this screen:

The available Topics for the cluster will be in the drop-down list. Select the one you want to work with, set the other fields as desired, and click "Next." This will display the following screen to define the schema:

Select the applicable information on this page and click "Next" to get to the final validation and Create step.

Next Steps

From here, you can configure a data pipeline in Arroyo that ingests from a WarpStream producer.

Next, check out the WarpStream docs for configuring the WarpStream Agent, or review the Arroyo Docs to learn more about what is possible with WarpStream and Arroyo!

Last updated