WarpStream Schema Linking
Replicate and migrate schemas to WarpStream BYOC Schema Registry.
Last updated
Was this helpful?
Replicate and migrate schemas to WarpStream BYOC Schema Registry.
Last updated
Was this helpful?
WarpStream Schema Linking can migrate schemas from Confluent-compatible schema registries to WarpStream’s BYOC Schema Registry. During migration, your schemas will never leave your cloud environment and object storage buckets.
In addition to migrating schemas, WarpStream Schema Linking also preserves schema IDs, subjects, subject versions, compatibility rules, etc. It even preserves whether subject versions are soft deleted. This means that after the migration, the destination schema registry should behave identically to the source schema registry from an API-level.
WarpStream Schema Linking is embedded natively into the WarpStream Agent, so you don’t need to run any additional infrastructure beyond the WarpStream Agents to migrate schemas.
Getting started with WarpStream Schema Linking is easy. First, deploy an agent for the schema registry you want to migrate schemas to. The agent must be able to run jobs, which is enabled by default or can be set explicitly with the -roles
flag (check out the Agent Roles documentation for more details).
For example:
warpstream agent -virtualClusterID vci_sr_XXXXXXXX -apiKey XXXXXXXX -bucketURL s3://my-warpstream-bucket -roles jobs,proxy
However, before deploying your Schema Linking configuration to production, let’s try migrating schemas into an ephemeral WarpStream Playground schema registry. You can start one with the command:
warpstream playground
This command will automatically create a schema registry and deploy a schema registry agent locally. The command will print a URL to a temporary WarpStream playground account. Open the URL in your browser and navigate to the schema registry.
Next, click the Schema Linking tab. You’ll see a text editor that allows you to modify the config. From here, you can edit the configuration of your pipeline, pause it, resume it, and roll your configuration forwards and backwards. To learn more about the Schema Linking config, check out the Config section.
Once you are done editing your config, you can click Save.
The configuration is now saved and deployed, but it’s not running yet. Click the toggle button above the Deploy button to change the pipeline’s state from PAUSED to RUNNING.
If you click on Full Details, you can see sync statistics such as how many subject versions were found in the source registry, how many newly migrated subject versions there are, etc.
When a Schema Linking pipeline is set up, the first thing it will do is to set the destination schema contexts' mode to IMPORT
. This prevents anyone except for the pipeline to write to the schema registry.
The destination schema context must be empty before the migration, so that the schema IDs can be preserved. If the pipeline detects that the destination schema context is not empty, it will automatically stop and fail.
To confirm that the destination schema registry is in IMPORT
mode, you can click on the Contexts tab.
Right now, your BYOC Schema Registry is just a read-replica of the source schema registry. To be able to write to the destination schema registry, you need to switch the schema contexts' modes from IMPORT
to READ_WRITE
.
To do that, edit the config and set the irreversible_switch_to_read_write_mode
field to true. After that, click the deploy button so that the pipeline will use the latest config. This operation is not reversible and it will terminate any in progress syncs.
After a while, you should see a new entry under Sync Stats stating that the schema contexts have been set to READ_WRITE mode. The pipeline will also automatically stop running.
To confirm that, you can go to the Contexts tab and you can see that the default schema context is in READ_WRITE
mode.
Now, you can read and write to your newly migrated WarpStream BYOC Schema Registry!
WarpStream Schema Linking continuously migrates the schema registry. This means that you can keep making changes to your source schema registry and the pipeline will eventually detect and apply those changes to the destination schema registry.
Note that there are limitations for what you can do to the source schema registry during migration, specifically hard deleting subjects. Check out the limitations section for more details.
Once Schema Linking is deployed, the pipeline will periodically sync the source registry with the destination registry. You can configure how frequently the syncs occur (e.g. once every 5 minutes, once every hour, etc).
During each sync, the Agents will fetch subjects, subject versions, and compatibility rules from the source schema registry using HTTP requests with Confluent’s Schema Registry API. The pipeline will then perform a diff between the source and destination to figure out what needs to be migrated.
This means that after the initial sync, only newly registered schemas will be fetched and migrated.
WarpStream Schema Linking is fully controllable from a single YAML config file which can be edited through the WarpStream console or the Pipelines API.
Here’s a quick summary of the YAML file above:
The sync_every_seconds
is set to 300
, which means that after each successful sync, the sync engine will wait for 300 seconds (5 minutes) before initiating a new sync.
The context_type
is set to "DEFAULT"
. This means that schemas will be copied from the default schema context of the source registry to the default schema context of the destination registry. Check out the context types section on other context types.
The source_schema_registry
field specifies the hostname, port, and credentials for the source schema registry HTTP server.
To make the Agents use TLS when connecting to the source schema registry, set the use_tls
flag to true.
To provide username/password for basic auth, you do not put the raw username/password in the config. Instead, you provide the environment variables that point to the username/password.
When deploying your agents, set the environment variables BASIC_AUTH_USERNAME_ENV
and BASIC_AUTH_PASSWORD_ENV
to the basic auth’s username and env, respectively.
When deploying your agents, set the environment variables MTLS_CERT_PATH_ENV
and MTLS_KEY_PATH_ENV
to the file paths to the PEM-encoded certificate and private key files.
WarpStream’s BYOC Schema Registry supports schema contexts. From a higher level, each schema context can be viewed as a separate “sub-registry”, with an isolated group of schema IDs and subject names. Learn more about schema contexts with Confluent’s schema contexts documentation.
When setting up WarpStream schema linking, you can specify the source and destination schema contexts. Let’s check out the different context types.
Each schema registry has a default context. When you register schemas and subjects without specifying an explicit context, you are writing to the default context.
If you specify the context_type
to be DEFAULT
, the schema migrator will migrate schemas and subjects from the source registry’s default schema context to the destination registry’s default schema context.
To pick which schema contexts to migrate to/from, you can specify the context_type
to be CONTEXT_MAPPINGS
. Then use the context_mappings
field to provide a list of source/destination schema contexts.
The config above specifies that the default schema context from the source registry is migrated to the .dest_foo
, .source_bar
is migrated into .dest_bar
, and .source_baz
is migrated into .dest_baz
.
Since WarpStream Schema Linking preserves schema IDs, the destination schema context must be empty before migration. If not, the pipeline will fail and stop.
During migration, hard deleting a subject is allowed only if you don’t register new schemas to that subject before the pipeline also hard deletes the subject from the destination schema registry.
During syncing, the Schema Linking pipeline assumes that a subject version always points to the same schema ID. This is true unless you hard delete a subject and register a schema under the subject. In that case, the subject version assignment resets to 1 and the assumption no longer holds.
However, if you hard delete and wait long enough for the pipeline to detect that the subject is hard deleted and deletes it from the destination schema registry, you can then register new subject versions and the pipeline will replicate that correctly, since it treats it as a new subject.
WarpStream's BYOC Schema Registry is not fully compatible with Confluent's Schema Registry. This means that things like metadata
, ruleSet
and subject aliases
will not be migrated to the destination schema registry. Check out the list of features that WarpStream's BYOC Schema Registry doesn't support.