Agent Roles
How to run different services on different sets of Agents
Configuring individual Agent roles is an advanced feature. We recommend familiarizing yourself with WarpStream in its default configuration first before considering splitting roles.
Also keep in mind that at least some of your Agents must run the jobs
role or your cluster will eventually cease to function until jobs
Agents are created and they can finish processing the accumulated backlog of unprocessed jobs.
What are Agent Roles
The WarpStream Agent has two primary roles:
Receiving data from Apache Kafka producers and serving data to Apache Kafka consumers
Running background jobs
The first role is what we call the proxy
role. It runs three main services.
The HTTP and TCP servers to respond to produce requests
The HTTP and TCP servers to respond to fetch requests
The file cache for reducing the number of object storage GET operations to respond to Fetch requests.
The second role is called the jobs
role. It runs all of the background tasks that are necessary for a WarpStream cluster to continue functioning. It runs a few different kinds of jobs.
It runs compaction jobs that periodically rewrite and merge files in object storage.
It runs retention jobs that delete files which contain out-of-retention data.
It runs a background job that makes some WarpStream-internal metrics available over the Prometheus endpoint.
By default, all Agents start both roles, but it is possible to configure your Agents to start only a subset of these roles.
Optionally, there is one additional role that can be configured called pipelines
. This role is used to run WarpStream Managed Data Pipelines. See our documentation for more details about this role.
All roles (except pipelines) are necessary in a deployment of WarpStream. You cannot run only one of the roles for example.
However, you can have some Agents run a single role.
How to configure Agent Roles
You can add a -roles
command line flags to the command line you use to run Agents. Alternatively, if you prefer to use environment variables, you can use the WARPSTREAM_AGENT_ROLES
environment variable instead.
Valid values are:
"proxy"
to run only theproxy
role, and to advertise that this Agent would like to receive all requests. You can connect your clients to this agent to produce records or to consume records. In the case of Schema Registry agents, using this role means the agent can handle all API requests."proxy-produce"
to run only theproxy
role, and to advertise that this Agent would like to receive only requests for pushing data to WarpStream. You can connect your clients to this agent to produce records, but not to consume records. This is not applicable for Schema Registry clusters."proxy-consume"
to run only theproxy
role, and to advertise that this Agent would like to receive only requests for reading data from WarpStream. You can connect your clients to this agent to consume records, but not to produce records. This is not applicable for Schema Registry clusters."jobs"
to run only the jobs role."pipelines"
to run only the pipelines role.
You can also combine values, for example
"proxy-consume,jobs"
runs thejobs
role and theproxy-consume
role."proxy,jobs"
runs all roles.
Make sure at least some of your Agents are running the jobs
role. This role performs compaction and clean up of deleted files in object storage, and without it your cluster will eventually cease to function entirely until jobs
Agents are added and they can finish processing the accumulated backlog of unprocessed jobs.
For example
Through an environment variable
You can export the WARPSTREAM_AGENT_ROLES
variable in the environment before starting your agent. Valid values are the same as the command line flags.
Addressing proxy-consume or proxy-produce Agents
If you don't configure the warpstream_proxy_target
feature on your Kafka client IDs, then your clients will connect to any Agent running a proxy role, regardless of whether its proxy-consume
or proxy-produce
.
If some of your Agents are running with either the proxy-consume
or the proxy-produce
role then you will need to update your Apache Kafka client configuration to indicate which set of Agents you want to target.
For a Kafka consumer which wants to target only the Agents that have set a
proxy-produce
role, add,warpstream_proxy_target=proxy-produce
at the end of your client_idFor a Kafka consumer which wants to target only the Agents that have set a
proxy-consume
role, add,warpstream_proxy_target=proxy-consume
at the end of your client_id
For more information about how to configure your Apache Kafka client with additional WarpStream features, see:
Configuring Kafka Client ID FeaturesLast updated