WarpStream BYOC Schema Registry

This page explains how to use WarpStream's BYOC Schema Registry.

Overview

WarpStream’s BYOC Schema Registry serves as a central repository to store and retrieve schemas used to serialize / deserialize and validate your Kafka records. The Schema Registry exposes a REST server that is API-compatible with Confluent's Schema Registry (more details about API compatibility below).

WarpStream’s BYOC Schema Registry employs the same zero-disk, stateless architecture as WarpStream BYOC. It stores data directly to object storage with no intermediary disks, separates storage from compute, and separates the data plane from the control plane. All schemas are stored in your object store while metadata that requires consensus (e.g. schemaID, versioning, etc) are offloaded to WarpStream’s control plane.

WarpStream’s Schema Registry is embedded natively into the WarpStream Agents. This makes deploying schema registries as easy as deploying the stateless WarpStream Agents. Unlike traditional Kafka schema registry in which only the leader node is capable of performing writes to the underlying Kafka log, any WarpStream agent is capable of both writing to and reading from the registry. Furthermore, scaling the number of Schema Registry Agents during traffic spikes is trivial due to the stateless nature of WarpStream Agents.

To learn more about security and privacy concerns for WarpStream BYOC Schema Registry clusters, check out the schema registry section of the security and privacy documentation.

Protocol Support

Currently, WarpStream's Schema Registry supports Avro schemas, with JSON Schema and Protobuf support coming soon.

WarpStream's BYOC Schema Registry supports most APIs specified in Confluent's Schema Registry API documentation. However, it doesn't support advanced features like data contracts and client-side field level encryption. For a full list of features not supported, check out the protocol documentation.

Run the Schema Registry Locally

Once you install the Agent binary, you can have a Schema Registry running locally on your laptop within seconds for you to test against. For instructions on how to run a Schema Registry locally, check out this doc.

Create a Schema Registry

To create a Schema Registry Virtual Cluster, you can either create it from the WarpStream console or the API.

Creating from the Console

To create a Schema Registry from the console, navigate to the Schema Registries tab and click the Create Schema Registry button.

Creating via API

To create a Schema Registry Virtual Cluster via API, invoke the /create_virtual_cluster endpoint and specify the virtual_cluster_type as byoc_schema_registry as follows:

curl https://api.prod.us-east-1.warpstream.com/api/v1/create_virtual_cluster \
-H 'warpstream-api-key: XXXXXXXXXX' \
-H 'Content-Type: application/json' \
-d '{"virtual_cluster_name": "XXXXXXXXXX", "virtual_cluster_type": "byoc_schema_registry", "virtual_cluster_region": "us-east-1", "virtual_cluster_cloud_provider": "aws"}'

The response object will contain the Schema Registry's Virtual Cluster ID as well as the agent key necessary to deploy the Agent. Note that Virtual Cluster IDs of Schema Registries always begin with vci_sr_.

See our Create Cluster API documentation for more details.

Deploy the BYOC Schema Registry Agents

In the playground/demo mode, we automatically deploy both a Kafka agent and a Schema Registry agent to make them easier to experiment with. For real clusters, you would have to deploy Schema Registry agents separately from your Kafka agents.

After you obtain a Schema Registry Virtual Cluster ID and an agent key, you can deploy the Schema Registry Agent the exact same way you would deploy your Kafka Agents, using the same Agent binary. The only difference is that the Agent will host a Schema Registry HTTP server instead of a Kafka TCP server when deployed.

See our deployment docs on how to deploy the Agent.

Monitor the Schema Registry Agents

You can monitor your Schema Registry Agents using metrics emitted by the Agents. See the schema registry section in our metrics documentation for more details.

Client Configuration

WarpStream’s BYOC schema registry is API-compatible with Confluent's schema registry. To obtain a Schema Registry URL that points to your Schema Registry Agents, navigate to WarpStream console's and click theConnect tab. Once you have the Schema Registry URL, you can slot it into your schema registry client. For example, here is how you can initialize Franz-go's Schema Registry client:

url := "api-80ba097c-d4ef-4e0b-8e86-d05b80fee6ed.discoveryv2.prod-z.us-east-1.warpstream.com:9094"
srClient, err := sr.NewClient(
	sr.URLs(url),
)
if err != nil {
	return fmt.Errorf("error initialiazing schema registry client: %w", err)
}

Alternatively if you use Kubernetes deployments, you can also use the service name from the deployed WarpStream chart as the schema registry URL, for example: warpstream-agent:9094

Configuring Client to Eliminate InterZone Networking Costs

To ensure your schema registry clients connect to agents within the same availability zone, you need to ensure there is at least one agent in the same availability zone as your clients. You also need to specify the client’s availability zone by embedding the availability zone into the Schema Registry URL. For example, to specify that a client is in asia-southeast-1a, embed the AZ into the URL like this: api-80ba097c-d4ef-4e0b-8e86-d05b80fee6ed.asia-southeast-1a.discoveryv2.prod-z.us-east-1.warpstream.com:9094.

This is not required for production usage, but it can help reduce costs for high volume schema registry workloads.

Using Agent Groups

If you ever need to split your Schema Registry cluster into different "groups" that are isolated at the network / service discovery layer, you can use Agent Groups. All you have to do is to pass the -agentGroup flag to the Agent binary. Check out the agent groups documentation for more information.

Note that the Schema Registry URL displayed in the console is currently not "agent group aware" and will randomly return Agent IP addresses from different groups. To use the Agent Group functionality, you should instead use URLs that target the Agents in the specific agent group.

For example, if you're using the official WarpStream Kubernetes chart. You can specify the agentGroup via the extraArgs property. You can then use the Kubernetes service name generated by the chart as the schema registry URL.

Authentication

WarpStream BYOC Schema Registry currently supports:

  • Basic Authentication

  • TLS Encryption

  • mTLS Authentication

TLS Encryption

To configure your WarpStream agent to enable TLS encryption, set the WARPSTREAM_SCHEMA_REGISTRY_TLS_ENABLEDenvironment variable to true.

After creating your TLS certicicates, you also have to set the WARPSTREAM_TLS_SERVER_CERT_FILEenvironment variable to the public key of the certificate and set the WARPSTREAM_TLS_SERVER_PRIVATE_KEY_FILEto the private key of the certificate.

For more information about how to configure TLS encryption for your WarpStream cluster, check out the TLS Encryption documentation.

Mutual TLS (mTLS) Authentication

To enable mTLS authentication, set the WARPSTREAM_REQUIRE_MTLS_AUTHENTICATIONenvironment variable to true. It's also recommended to set the environment variable WARPSTREAM_TLS_CLIENT_CA_CERT_FILEto the public keys of the certificate authorities that sign your client certificates. Note that mTLS authentication requires TLS encryption.

For more information on how to configure mTLS authentication for your WarpStream cluster, check out the mTLS documentation.

Basic Authentication

When basic authentication is enabled, the WarpStream Agent uses the username/password encoded in the HTTP request's Authorization header to authenticate your Schema Registry clients.

To configure your WarpStream agent to enable basic authentication, set the WARPSTREAM_SCHEMA_REGISTRY_BASIC_AUTH_ENABLEDenvironment variable to true.

For more information about how to set up basic authentication for your WarpStream cluster, check out the basic authentication documentation.

Integrating with WarpStream Schema Validation

You can configure your Kafka agents to perform server-side schema validation that checks whether the data actually conforms to the expected schema. To enable your Kafka agents to fetch schemas from your WarpStream schema registry, you need to do two things:

  • Set the -schemaValidationVirtualClusterID flag when deploying the Kafka agent.

  • Make sure the agent has permissions to read existing files from the object storage bucket that holds the schemas for your BYOC Schema Registry. Check out the schema validation docs for more details on how to configure the permissions.

The best part is that when perform schema validation alongside WarpStream's BYOC Schema Registry, yu don't need any Schema Registry agents running! This is because the Kafka agent can just fetch your schemas directly from object storage.

Check out the schema validation docs for more details.

Limits

Here are some enforced limits for each Schema Registry:

  • The number of schema versions is limited to 20,000 versions. You can track how many schema versions you have with the warpstream_schema_versions_count. For more details, check out the metrics documentation.

  • The size limit of each schema is limited to 1MB.

If you need an increase for one or more of these limits, contact us at support@warpstreamlabs.com.

Last updated

Was this helpful?