Agent Schema Validation

This page describes how to enable schema validation on the WarpStream Agent

The WarpStream Agent can be configured to perform schema validation on records encoded in Confluent's SerDes wire format. It can then reject records or emit metrics when it receives an invalid record.

Note that enabling schema validation will increase CPU usage of the agent.

How Schema Validation Works

Kafka messages serialized using the SerDes wire format will contain a schema ID. WarpStream's schema validation not only verifies that the schema ID is present and valid (according to the subject name strategy), it also verifies that the record's data actually conforms to the schema.

Currently, WarpStream’s schema validation supports the following formats: avro and JSON Schema.

With Schema Validation enabled, the agent returns an error to the producer and discards the record (or emits a metric, if configured to do so) in the following scenarios:

  • record does not conform to the SerDes wire format (e.g. no magic byte, no schema ID)

  • an invalid schema ID is provided

  • the schema ID is not tied to a subject which matches the subject name strategy

  • the record data does not conform to the schema

Topic Level Configurations for Schema Validation

Schema validation is configured per topic level. The following configurations can be provided when a topic is created or altered.

ConfigurationDescription

warpstream.key.schema.validation

Boolean config that indicates whether to validate the record key.

warpstream.key.subject.name.strategy

Config that determines which schemas are allowed for the record key.

Allowed values: TopicNameStrategy, RecordNameStrategy, TopicRecordNameStrategy. See more details below.

warpstream.value.schema.validation

Boolean config that indicates whether to validate the record value.

warpstream.value.subject.name.strategy

Config that determines which schemas are allowed for the record key.

Allowed values: TopicNameStrategy, RecordNameStrategy, TopicRecordNameStrategy. See more details below.

warpstream.schema.validation.warning.only

When an invalid record is detected, the Agent allows the record to be written, but emits a metric indicating that the record is invalid instead of rejecting the record. The metric (counter) emitted is: schema_registry_validation_invalid_record

Defaults to true.

Subject Name Strategy

Each schema in the Schema Registry is registered under a subject. During schema validation, the agent looks up the subject for the schema ID and verifies that the subject conforms to the subject name strategy.

There are three subject name strategies:

StrategyDefinition

TopicNameStrategy

The subject is derived from the topic name with the following format:

  • <topic name>-key for the record key

  • <topic value>-value for the record value.

RecordNameStrategy

The subject is the schema’s fully-qualified record name.

TopicRecordNameStrategy

The subject is a combination of the topic name and the record name with the following format: <topic name>-<fully-qualified record name>

The fully-qualified record name for Avro is the record’s namespace + record name. For JSON Schema, the record name is the title.

Last updated