WarpStream BYOC Schema Registry

This page explains how to use WarpStream's BYOC Schema Registry.

Overview

WarpStream’s BYOC Schema Registry serves as a central repository to store and retrieve schemas used to serialize / deserialize and validate your Kafka records. The Schema Registry exposes a REST server that is API-compatible with Confluent's Schema Registry (more details about API compatibility below).

WarpStream’s BYOC Schema Registry employs the same zero-disk, stateless architecture as WarpStream BYOC. It stores data directly to object storage with no intermediary disks, separates storage from compute, and separates the data plane from the control plane. All schemas are stored in your object store while metadata that requires consensus (e.g. schemaID, versioning, etc) are offloaded to WarpStream’s control plane.

WarpStream’s Schema Registry is embedded natively into the WarpStream Agents. This makes deploying schema registries as easy as deploying the stateless WarpStream Agents. Unlike traditional Kafka schema registry in which only the leader node is capable of performing writes to the underlying Kafka log, any WarpStream agent is capable of both writing to and reading from the registry. Furthermore, scaling the number of Schema Registry Agents during traffic spikes is trivial due to the stateless nature of WarpStream Agents.

Protocol Support

Currently, WarpStream's Schema Registry supports Avro schemas, with JSON Schema and Protobuf support coming soon.

WarpStream's BYOC Schema Registry supports most APIs specified in Confluent's Schema Registry API documentation. However, it doesn't support advanced features like data contracts and client-side field level encryption. For a full list of features not supported, check out the protocol documentation.

Run the Schema Registry Locally

Once you install the Agent binary, you can have a Schema Registry running locally on your laptop within seconds for you to test against. For instructions on how to run a Schema Registry locally, check out this doc.

Create a Schema Registry

To create a Schema Registry Virtual Cluster, you can either create it from the WarpStream console or the API.

Creating from the Console

To create a Schema Registry from the console, navigate to the Schema Registries tab and click the Create Schema Registry button.

Creating via API

To create a Schema Registry Virtual Cluster via API, invoke the /create_virtual_cluster endpoint and specify the virtual_cluster_type as byoc_schema_registry as follows:

curl https://api.prod.us-east-1.warpstream.com/api/v1/create_virtual_cluster \
-H 'warpstream-api-key: XXXXXXXXXX' \
-H 'Content-Type: application/json' \
-d '{"virtual_cluster_name": "XXXXXXXXXX", "virtual_cluster_type": "byoc_schema_registry", "virtual_cluster_region": "us-east-1", "virtual_cluster_cloud_provider": "aws"}'

The response object will contain the Schema Registry's Virtual Cluster ID as well as the agent key necessary to deploy the Agent. Note that Virtual Cluster IDs of Schema Registries always begin with vci_sr_.

See our Create Cluster API documentation for more details.

Deploy the BYOC Schema Registry Agents

In the playground/demo mode, we automatically deploy both a Kafka agent and a Schema Registry agent to make them easier to experiment with. For real clusters, you would have to deploy Schema Registry agents separately from your Kafka agents.

After you obtain a Schema Registry Virtual Cluster ID and an agent key, you can deploy the Schema Registry Agent the exact same way you would deploy your Kafka Agents, using the same Agent binary. The only difference is that the Agent will host a Schema Registry HTTP server instead of a Kafka TCP server when deployed.

See our deployment docs on how to deploy the Agent.

Monitor the Schema Registry Agents

You can monitor your Schema Registry Agents using metrics emitted by the Agents. See the schema registry section in our metrics documentation for more details.

Client Configuration

WarpStream’s BYOC schema registry is API-compatible with Confluent's schema registry. To obtain a Schema Registry URL that points to your Schema Registry Agents, navigate to WarpStream console's and click theConnect tab. Once you have the Schema Registry URL, you can slot it into your schema registry client. For example, here is how you can initialize Franz-go's Schema Registry client:

url := "api-80ba097c-d4ef-4e0b-8e86-d05b80fee6ed.discoveryv2.prod-z.us-east-1.warpstream.com:9094"
srClient, err := sr.NewClient(
	sr.URLs(url),
)
if err != nil {
	return fmt.Errorf("error initialiazing schema registry client: %w", err)
}

Alternatively if you use Kubernetes deployments, you can also use the service name from the deployed WarpStream chart as the schema registry URL, for example: warpstream-agent:9094

Configuring Client to Eliminate InterZone Networking Costs

To ensure your schema registry clients connect to agents within the same availability zone, you need to ensure there is at least one agent in the same availability zone as your clients. You also need to specify the client’s availability zone by embedding the availability zone into the Schema Registry URL. For example, to specify that a client is in asia-southeast-1a, embed the AZ into the URL like this: api-80ba097c-d4ef-4e0b-8e86-d05b80fee6ed.asia-southeast-1a.discoveryv2.prod-z.us-east-1.warpstream.com:9094.

This is not required for production usage, but it can help reduce costs for high volume schema registry workloads.

Using Agent Groups

If you ever need to split your Schema Registry cluster into different "groups" that are isolated at the network / service discovery layer, you can use Agent Groups. All you have to do is to pass the -agentGroup flag to the Agent binary. Check out the agent groups documentation for more information.

Note that the Schema Registry URL displayed in the console is currently not "agent group aware" and will randomly return Agent IP addresses from different groups. To use the Agent Group functionality, you should instead use URLs that target the Agents in the specific agent group.

For example, if you're using the official WarpStream Kubernetes chart. You can specify the agentGroup via the extraArgs property. You can then use the Kubernetes service name generated by the chart as the schema registry URL.

Authentication

WarpStream’s Schema Registry currently supports TLS/MTLS encryption. Instructions for enabling TLS/MTLS encryption for schema registry are available in the authentication docs.

Integrating with WarpStream Schema Validation

You can configure your Kafka agents to perform server-side schema validation that checks whether the data actually conforms to the expected schema. To enable your Kafka agents to fetch schemas from your WarpStream schema registry, you need to do two things:

  • Set the -schemaValidationVirtualClusterID flag when deploying the Kafka agent.

  • Make sure the agent has permissions to read existing files from the object storage bucket that holds the schemas for your BYOC Schema Registry. Check out the schema validation docs for more details on how to configure the permissions.

The best part is that when perform schema validation alongside WarpStream's BYOC Schema Registry, yu don't need any Schema Registry agents running! This is because the Kafka agent can just fetch your schemas directly from object storage.

Check out the schema validation docs for more details.

Limits

Here are some enforced limits for each Schema Registry:

  • The number of schema versions is limited to 20,000 versions. You can track how many schema versions you have with the warpstream_schema_versions_count. For more details, check out the metrics documentation.

  • The size limit of each schema is limited to 1MB.

If you need an increase for one or more of these limits, contact us at support@warpstreamlabs.com.

Last updated