Apache Kafka has made a landmark shift in KIP-500 with the introduction of Kafka Raft (KRaft) mode, eliminating the dependency on Apache ZooKeeper for metadata management. With KRaft, the Kafka nodes themselves can be configured as KRaft controllers – which allow for metadata management and leader elections to work all within just Kafka, resulting in significant performance improvements. This cemented KRaft’s status as the metadata management protocol for Kafka moving forward.

This blog will guide you through the importance of this transition, what migrating from ZooKeeper to KRaft entails, and how we, at NetApp Instaclustr, make this seamless with our automated, streamlined process that is built into our platform.

Migrating from ZooKeeper to KRaft mode

For those who have Kafka clusters that are still running in ZooKeeper mode, it is possible to migrate the cluster to operate in the KRaft mode, while keeping the cluster available during the migration process. This was made possible in KIP-886 ZooKeeper to KRaft migration. For NetApp Instaclustr customers using our managed Kafka offering, our platform provides an automated process to facilitate this.

Challenges of manually migrating from ZooKeeper to KRaft

Manually migrating a Kafka cluster from a legacy ZooKeeper-based setup to KRaft mode is a complex, time-consuming process—especially for larger deployments. Below are some high-level steps in a manual migration:

  1. Deploying the new KRaft quorum
  2. Setting the brokers to migration mode and load metadata into the quorum
  3. Migrating brokers to KRaft
  4. Finalizing the migration and ensuring KRaft controllers are no longer in migration mode
  5. Rolling back the cluster to ZooKeeper mode during the migration

Manual migration requires meticulous configuration changes, careful coordination, and multiple node restarts, all of which increase the risk of human errors and downtime.

For more information about how to perform each step in detail, visit Apache’s official documentation for the manual migration process.

Key considerations for your migration

While migrating from ZooKeeper to Kraft is a significant upgrade, it’s important to keep a few considerations in mind before initiating the process:

  1. Prerequisites
    Instaclustr for Apache Kafka clusters must be running Kafka 3.9.x. Older versions require an upgrade before migration (if this is the case for your cluster, our support team can help you through the process).
  2. Limitations
    Once the migration is complete, reverting to ZooKeeper isn’t possible. However, if in the rare event that a rollback during the migration is needed, it will require manual intervention from our support team. Our recommendation is that customers should conduct a migration first in a non-production environment, allowing any issues that may occur in the migration process to be identified and resolved before performing it in production.
  3. Maintenance window
    A maintenance window needs to be scheduled with our team. You should choose a window with a period of low traffic to reduce performance impacts on your clients and applications. Larger clusters may require additional time or multiple windows. Clients may experience brief resource usage spikes during the migration, but the cluster remains operational throughout with no downtime.

How our automated ZooKeeper to KRaft migration works

Here’s an overview of how Instaclustr automates the migration process, so you don’t have to lift a finger. Our experienced support team will update customers on the status of the migration as it is being performed.

  1. Initial backup of ZooKeeper nodes
    • Before the migration begins, a backup of the ZooKeeper nodes occurs to preserve existing metadata. It is retained for 7 days.
  2. Deployment of the KRaft controller quorum
    • If the original cluster used collocated ZooKeeper nodes, then the new KRaft controllers will be deployed on existing broker nodes.
    • In the case of dedicated ZooKeeper nodes, the new KRaft controllers will be deployed on newly created nodes.
  3. Phased broker migration to KRaft mode
    • Health checks are performed, and nodes are processed rack-by-rack to ensure availability.
    • After each broker is restarted, broker metadata transfer to the KRaft quorum commences.
  4. Additional safety measures
    • Once all the brokers are operating in migration mode, a second cluster-wide restart is performed to transition the brokers into KRaft mode. Before the restarts, the node configurations need to have some changes, including removing ZooKeeper-specific properties.
    • Similarly to the previous restart, health checks are performed, and the nodes are processed rack-by-rack.
  5. Transition to full KRaft mode
    • The KRaft controllers’ configurations are changed to have migration-related properties removed, and the controllers are restarted. Afterwards, the cluster operates fully in KRaft mode and can no longer be reverted to ZooKeeper.
  6. ZooKeeper components are decommissioned
    • An additional ZooKeeper backup is performed just prior to decommission and retained for 7 days.
    • For deployments with dedicated ZooKeeper nodes, the nodes are deleted.
    • For deployments with co-located ZooKeeper processes and related directories are removed.

Why choose Instaclustr for automated ZooKeeper to KRaft migration?

Choosing our automated process to migrate from ZooKeeper to KRaft isn’t just about convenience. It’s a smarter choice with tangible advantages:

  1. Safety is built into the process
    ZooKeeper backups are performed at the start, prior to migration, and at the end prior to decommissioning ZooKeeper-related components for additional data preservation. Regular safety checks are performed as well to ensure that the process continues only while the cluster is in a valid state, so that minimal human oversight is required.
  2. Minimized risk of human error – especially for larger clusters
    Within each of the steps, precise configuration is demanded. If a single property is missing, the migration could fail. This becomes much more likely with large clusters – requiring precise manual effort to configure every node, and far more time used that could be better utilized elsewhere.
  3. Improved efficiency of the migration
    By performing the migration on a rack-by-rack basis, concurrent node restarts can be performed within the current rack being migrated, increasing efficiency while maximizing availability. The removal of the requirement of human administrative effort – configuration changes and restarts being performed automatically means that the migration process, especially when scaled to larger clusters, completes faster minimizing operational impact.
  4. Our customers pay no additional cost for this automated migration
    For Instaclustr customers using our managed Kafka offering, this automated migration process is performed on the cluster by our experienced operations team to provide a smooth and reliable transition at no additional financial cost for our customers.

Begin your journey to KRaft today

With the Apache Kafka project direction clearly headed towards KRaft, having a stable, reliable and efficient migration process to transition existing ZooKeeper clusters to the new KRaft mode is imperative. At NetApp, customers using Instaclustr for managed Apache Kafka can have ZooKeeper clusters migrated through our automated process designed for a safe and reliable transition.

If you are a customer interested in migrating your Kafka cluster from ZooKeeper to Kraft, please contact customer support, and view our documentation for more information.

If you are interested in talking with a Kafka migration expert, contact us for a free consultation.