Agent Roles

How to run different services on different sets of Agents

What are Agent Roles

The WarpStream Agent has two primary roles:

  1. Receiving data from Apache Kafka producers and serving data to Apache Kafka consumers

  2. Running background jobs

The first role is what we call the proxy role. It runs three main services.

  • The HTTP and TCP servers that respond to produce requests

  • The HTTP and TCP servers that respond to fetch requests

  • The file cache for reducing the number of object storage GET operations to respond to Fetch requests.

The second role is called the jobs role. It runs all of the background tasks that are necessary for a WarpStream cluster to continue functioning. It runs a few different kinds of jobs:

  • Compaction jobs periodically rewrite and merge files in object storage.

  • Retention jobs delete files which contain out-of-retention data.

  • Publish metrics jobs that makes some WarpStream-internal metrics available over the Prometheus endpoint.

By default, all Agents run both the proxyand jobs roles, but it is possible to configure your Agents to start only a subset of these roles.

Optionally, there is one additional role that can be configured called pipelines. This role is used to run WarpStream Managed Data Pipelines. See our documentation for more details about this role.

Configuring Agent Roles

Use the -roles command line flag or the WARPSTREAM_AGENT_ROLES environment variable to configure the roles that an Agent should run.

Valid values are:

  • "proxy" to run only the proxyrole, and to advertise that this Agent is able to process all Kafka protocol requests. You can connect your clients to this agent to produce records or to consume records. In the case of Schema Registry agents, using this role means the agent can handle all API requests.

  • "proxy-produce" to run only the proxy-produce role which is a subset of the proxy role and indicates that this Agent will process Produce requests, but not Fetch requests. This role is not applicable for Schema Registry clusters.

  • "proxy-consume" to run only the proxy-consume role which is a subset of the proxy role and indicates that this Agent will process Fetch requests, but not Produce requests. This role is not applicable for Schema Registry clusters.

  • "jobs" to run only the jobs role.

  • "pipelines" to run only the pipelines role (only applicable if using the Managed Data Pipelines product).

You can also combine values, for example:

  • "proxy-consume,jobs" runs the jobs role and the proxy-consume role.

  • "proxy,jobs,pipelines" runs all roles.

A simple example using our Docker container:

docker run public.ecr.aws/warpstream-labs/warpstream_agent_linux_amd64:latest \
    agent \
    -bucketURL mem://mem_bucket \
    -apiKey $YOUR_API_KEY \
    -defaultVirtualClusterID $YOUR_VIRTUAL_CLUSTER_ID
    -roles "jobs"

Targeting Agent Groups

If some of your Agents are running with either the proxy-consume or the proxy-produce role then you will need to update your Apache Kafka client configuration to indicate which set of Agents you want to target based on whether your client is a Producer or Consumer.

  • For a Kafka consumer application that needs to target Agents running the proxy-consume role, add ,warpstream_proxy_target=proxy-consume to the end of your client_id.

  • For a Kafka producer application that needs to target Agents running the proxy-produce role, add ,warpstream_proxy_target=proxy-producer to the end of your client_id.

For more information about how to configure your Apache Kafka client with additional WarpStream features like role target, see our Configuring Kafka Client ID Features documentation.

Last updated

Was this helpful?