Agent Roles

How to run different services on different sets of Agents

Configuring individual Agent roles is an advanced feature. We recommend familiarizing yourself with WarpStream in its default configuration first before considering splitting roles.

Also keep in mind that at least some of your Agents must run the jobs role or your cluster will eventually cease to function until jobs Agents are created and they can finish processing the accumulated backlog of unprocessed jobs.

What are Agent Roles

The WarpStream Agent has two primary roles:

  1. Receiving data from Apache Kafka producers and serving data to Apache Kafka consumers

  2. Running background jobs

The first role is what we call the proxy role. It runs three main services.

  • The HTTP and TCP servers that respond to produce requests

  • The HTTP and TCP servers that respond to fetch requests

  • The file cache for reducing the number of object storage GET operations to respond to Fetch requests.

The second role is called the jobs role. It runs all of the background tasks that are necessary for a WarpStream cluster to continue functioning. It runs a few different kinds of jobs:

  • Compaction jobs periodically rewrite and merge files in object storage.

  • Retention jobs delete files which contain out-of-retention data.

  • Publish metrics jobs that makes some WarpStream-internal metrics available over the Prometheus endpoint.

By default, all Agents run both the proxyand jobs roles, but it is possible to configure your Agents to start only a subset of these roles.

Optionally, there is one additional role that can be configured called pipelines. This role is used to run WarpStream Managed Data Pipelines. See our documentation for more details about this role.

All roles (except pipelines) are necessary in a deployment of WarpStream. You cannot run only one of the roles for example.

However, you can have some Agents run a single role.

Configuring Agent Roles

If you're upgrading an existing WarpStream cluster with traffic to enable roles, follow the Targeting Agent Groups instructions to configure your Kafka clients to target specific roles first, then redeploy the Agents to configure them into dedicated groups.

This is especially important if you're splitting out the proxy-produce and proxy-consume roles.

If you don't change your Kafka client configuration first, then some of your producers will end up connected to the proxy-consume Agents and their Produce requests will be rejected with an INVALID_REQUEST error message. Similarly, some of your consumers will end up connected to the proxy-produce Agents and their Fetch requests will be rejected as well.

Use the -roles command line flag or the WARPSTREAM_AGENT_ROLES environment variable to configure the roles that an Agent should run.

Valid values are:

  • "proxy" to run only the proxyrole, and to advertise that this Agent is able to process all Kafka protocol requests. You can connect your clients to this agent to produce records or to consume records. In the case of Schema Registry agents, using this role means the agent can handle all API requests.

  • "proxy-produce" to run only the proxy-produce role which is a subset of the proxy role and indicates that this Agent will process Produce requests, but not Fetch requests. This role is not applicable for Schema Registry clusters.

  • "proxy-consume" to run only the proxy-consume role which is a subset of the proxy role and indicates that this Agent will process Fetch requests, but not Produce requests. This role is not applicable for Schema Registry clusters.

  • "jobs" to run only the jobs role.

  • "pipelines" to run only the pipelines role (only applicable if using the Managed Data Pipelines product).

You can also combine values, for example:

  • "proxy-consume,jobs" runs the jobs role and the proxy-consume role.

  • "proxy,jobs,pipelines" runs all roles.

Make sure at least some of your Agents are running the jobs role. This role performs compaction and clean up of deleted files in object storage, and without it your cluster will eventually cease to function entirely until jobs Agents are added and they can finish processing the accumulated backlog of unprocessed jobs.

A simple example using our Docker container:

docker run public.ecr.aws/warpstream-labs/warpstream_agent_linux_amd64:latest \
    agent \
    -bucketURL mem://mem_bucket \
    -apiKey $YOUR_API_KEY \
    -defaultVirtualClusterID $YOUR_VIRTUAL_CLUSTER_ID
    -roles "jobs"

Targeting Agent Groups

If you don't configure the warpstream_proxy_target feature on your Kafka client IDs, then your clients will connect to any Agent running a proxy role, regardless of whether its proxy-consume or proxy-produce. This may break your application as the proxy-consume Agents will reject Produce requests and the proxy-produce Agents will reject Fetch requests.

If some of your Agents are running with either the proxy-consume or the proxy-produce role then you will need to update your Apache Kafka client configuration to indicate which set of Agents you want to target based on whether your client is a Producer or Consumer.

  • For a Kafka consumer application that needs to target Agents running the proxy-consume role, add ,warpstream_proxy_target=proxy-consume to the end of your client_id.

  • For a Kafka producer application that needs to target Agents running the proxy-produce role, add ,warpstream_proxy_target=proxy-producer to the end of your client_id.

For more information about how to configure your Apache Kafka client with additional WarpStream features like role target, see our Configuring Kafka Client ID Features documentation.

Last updated

Was this helpful?