Agent Roles

How to run different services on different sets of Agents

Configuring individual Agent roles is an advanced feature. We recommend familiarizing yourself with WarpStream in its default configuration first before considering splitting roles.

Also keep in mind that at least some of your Agents must run the jobs role or your cluster will eventually cease to function until jobs Agents are created and they can finish processing the accumulated backlog of unprocessed jobs.

What are Agent Roles

The WarpStream Agent has two primary roles:

  1. Receiving data from Apache Kafka producers and serving data to Apache Kafka consumers

  2. Running background jobs

The first role is what we call the proxy role. It runs three main services.

  • The HTTP and TCP servers to respond to produce requests

  • The HTTP and TCP servers to respond to fetch requests

  • The file cache for reducing the number of object storage GET operations to respond to Fetch requests.

The second role is called the jobs role. It runs all of the background tasks that are necessary for a WarpStream cluster to continue functioning. It runs a few different kinds of jobs.

  • It runs compaction jobs that periodically rewrite and merge files in object storage.

  • It runs retention jobs that delete files which contain out-of-retention data.

  • It runs a background job that makes some WarpStream-internal metrics available over the Prometheus endpoint.

By default, all Agents start both roles, but it is possible to configure your Agents to start only a subset of these roles.

Optionally, there is one additional role that can be configured called pipelines. This role is used to run WarpStream Managed Data Pipelines. See our documentation for more details about this role.

All roles (except pipelines) are necessary in a deployment of WarpStream. You cannot run only one of the roles for example.

However, you can have some Agents run a single role.

How to configure Agent Roles

You can add a -roles command line flags to the command line you use to run Agents. Alternatively, if you prefer to use environment variables, you can use the WARPSTREAM_AGENT_ROLES environment variable instead.

Valid values are:

  • "proxy" to run only the proxyrole, and to advertise that this Agent would like to receive all requests. You can connect your clients to this agent to produce records or to consume records. In the case of Schema Registry agents, using this role means the agent can handle all API requests.

  • "proxy-produce" to run only the proxyrole, and to advertise that this Agent would like to receive only requests for pushing data to WarpStream. You can connect your clients to this agent to produce records, but not to consume records. This is not applicable for Schema Registry clusters.

  • "proxy-consume" to run only the proxyrole, and to advertise that this Agent would like to receive only requests for reading data from WarpStream. You can connect your clients to this agent to consume records, but not to produce records. This is not applicable for Schema Registry clusters.

  • "jobs" to run only the jobs role.

  • "pipelines" to run only the pipelines role.

You can also combine values, for example

  • "proxy-consume,jobs" runs the jobs role and the proxy-consume role.

  • "proxy,jobs" runs all roles.

Make sure at least some of your Agents are running the jobs role. This role performs compaction and clean up of deleted files in object storage, and without it your cluster will eventually cease to function entirely until jobs Agents are added and they can finish processing the accumulated backlog of unprocessed jobs.

For example

docker run public.ecr.aws/warpstream-labs/warpstream_agent_linux_amd64:latest \
    agent \
    -bucketURL mem://mem_bucket \
    -apiKey $YOUR_API_KEY \
    -defaultVirtualClusterID $YOUR_VIRTUAL_CLUSTER_ID
    -roles "jobs"

Through an environment variable

You can export the WARPSTREAM_AGENT_ROLES variable in the environment before starting your agent. Valid values are the same as the command line flags.

Addressing proxy-consume or proxy-produce Agents

If you don't configure the warpstream_proxy_target feature on your Kafka client IDs, then your clients will connect to any Agent running a proxy role, regardless of whether its proxy-consume or proxy-produce.

If some of your Agents are running with either the proxy-consume or the proxy-produce role then you will need to update your Apache Kafka client configuration to indicate which set of Agents you want to target.

  • For a Kafka consumer which wants to target only the Agents that have set a proxy-produce role, add ,warpstream_proxy_target=proxy-produce at the end of your client_id

  • For a Kafka consumer which wants to target only the Agents that have set a proxy-consume role, add ,warpstream_proxy_target=proxy-consume at the end of your client_id

For more information about how to configure your Apache Kafka client with additional WarpStream features, see:

Last updated