Apache Kafka® Schema Registry is now available

Apache Kafka
Technical

Apache Kafka® Schema Registry is now available on the Instaclustr Managed Platform

May 08, 2019
By Instaclustr

Instaclustr is pleased to announce the availability of Schema Registry support as an add-on for Kafka offering on the Instaclustr Managed Platform.

When using Apache Kafka, there is an implicit assumption that developers of Kafka clients (Producers and Consumers) ensure the implementation takes care that the format of the messages being written to and read from Kafka is the same, and any changes are ensured to be compatible between Producers and Consumers. The Kafka Schema Registry takes over this responsibility and enables Kafka clients to write and read messages with a well defined and agreed schema by programmatically enforcing a contractual agreement between Kafka Producers and Consumers. As schemas evolve, the Schema Registry also ensures the contract between Producer-Consumer is upheld by providing centralized schema management capability and compatibility checks. It enables large development teams to work on different parts of Kafka implementation concurrently and rapidly without being worried about compatibility issues. As developers add/remove fields for a topic, new schemas and versions are recorded in the central repository that the Schema Registry maintains, and messages are automatically validated and checked for compatibility as consumers read them.

Instaclustr offers this support using Confluent’s implementation of Kafka Schema Registry which is an open source Apache 2.0 licensed implementation. We used the latest stable version of this repo at the time we started developing this capability, that is, version 5.0.0. Confluent announced last year that the future releases of their Kafka Schema Registry implementation will use a more restrictive license. So, Instaclustr has taken a fork of the latest repo and will continue to maintain it under the Apache 2.0 license. The repo can be found here – https://github.com/instaclustr/schema-registry. We welcome all interested community members to contribute and enhance its capabilities.

Provisioning of the Kafka Schema Registry uses Instaclustr’s standard approach for provisioning and secure HTTPS-based interfaces:

DNS entries in the cnodes.com domain are automatically created for the end points
Fully trusted certificates for the end points are automatically generated using the Let’s Encrypt public certificate authority. A dedicated certificate is issued for each Kafka cluster.

When Schema Registry add-on is included in a Kafka cluster, it adds 20% of the Cluster cost on to the monthly bill.

How to use Schema Registry feature on Instaclustr Managed Platform?

Schema Registry exposes REST APIs that integrate with the rest of the Kafka stack. Kafka clients can use these endpoints to write, manage and read topic schemas. The clients then use the registered format to write and read messages from the Kafka cluster. To do this, you can add Confluent’s Kafka Client serialization library (Serde) to your Kafka client and the library automatically takes care of communicating to the Schema Registry server for schema management and to the Kafka cluster to write and read messages. The Schema Registry supports Apache Avro schema which the serialisation library uses internally. The serialization library should be added to both Kafka Producers and Consumers which handles serialisation and deserialisation of messages to manage the schema, under the hood. For more details on how to use Schema Registry feature for Kafka clusters on Instaclustr Managed platform, refer to the support documentation.

Kafka Schema Registry deployment strategy

Kafka Schema Registry can be colocated within the Kafka cluster which is streaming data for your applications or alternatively, you may create a separate (Kafka) cluster just to run the Schema Registry. Note that there is no communication between the Schema Registry servers and the Kafka cluster and hence can be located in separate clusters without any performance overhead. Kafka clients integrated with the Serialisation and Avro libraries will communicate with the Schema Registry cluster and the actual Kafka cluster (that is streaming data) seamlessly. If you need to operate the Schema Registry cluster as a Virtual Private Network with private-only IPs, you need to follow the usual procedure to configure VPC peering between the Schema Registry cluster and the Kafka clients running in your Application environment.

If you are wondering why would someone want to run the Schema Registry in a different Kafka cluster, there could be a few benefits depending upon your use case and the size of the Kafka cluster. If you plan to deploy multiple Kafka clusters on Instaclustr Managed platform and if you wish to share topic schemas across them, you can do so by having a dedicated cluster for Schema Registry that is shared across multiple Kafka clusters. This can avoid the overhead of managing schemas across multiple clusters. Another scenario is when you want to decommission a Kafka cluster running in production but want to retain and reuse the existing topic schemas in the Schema Registry, you can run the Schema Registry in a dedicated cluster which holds your topic schemas and you can decommission the Kafka cluster without losing the schemas. Depending on the size of your Kafka cluster, a third benefit could be cost savings. As Schema Registry feature adds 20% cost to the monthly bill of the cluster it is running in, if you have a large Kafka cluster, that amount can be substantial. Instead, running Schema Registry in a separate dedicated cluster with only 3 nodes just adds 20% on top of the 3-node cluster for an overall cheaper and more robust solution. The cost benefit is even more when sharing the Schema Registry cluster across multiple Kafka clusters. If you have relatively small Kafka clusters with only a handful of nodes, colocating it with the Kafka cluster is a better option.

Kafka Schema Registry is one of the key Kafka feature enhancements requested by our customers and we expect significant use of it in the near future. As we learn more about it from real-world use cases, we will continue to write more on its use, its best practices and more importantly continue to enhance our Kafka offerings.

In the meantime, if you have any questions or clarifications about Kafka Schema Registry and how to deploy it for your existing Kafka clusters, please reach out to our support team.

OpenSearch® Versions 2.14 and 1.3.17 Now Available

Powering AI Workloads with Intelligent Data Infrastructure and Open Source

Instaclustr for ClickHouse® now in Private Preview