Embracing the Future of Kafka: Why It’s Time to Migrate from ZooKeeper to KRaft
Next Release removes ZooKeeper support
You probably have heard something about ZooKeeper-less Kafka or KRaft before. It represents a key moment in the evolution of Kafka and its impact will be felt in many organizations. Kafka has long been the backbone of modern data streaming, enabling organizations to process, store, and analyze massive streams of data in real-time. Behind the scenes, ZooKeeper has acted as the storage system used by Kafka brokers for coordination. It is the source of truth for the cluster metadata. While effective, this setup has added complexity by requiring a second separate system to manage the Kafka cluster. However, Kafka is evolving, and with it comes a new era in its development — the transition to KRaft.
Starting from version 4.0, Kafka will drop ZooKeeper support, making KRaft the only way of running Kafka clusters from that point on. The 3.9 version, released in November 2024, is the last version to support ZooKeeper mode. According to the Kafka documentation, it will receive critical bug fixes and security fixes for 12 months after its release.
This means that all ZooKeeper-based clusters will lose official support starting November 2025. The importance of this migration cannot be understated. Users of Kafka, no matter their cluster size, vendor or deployment method, will have to migrate to KRaft eventually to keep receiving Kafka updates. This is a one-time step: After the migration has been finalized, it is not possible to revert back to ZooKeeper mode. It may sound like a trivial change but it is not. Its implications are far-reaching and migrating the existing clusters might not be the answer in all cases.
In this blog post, we will explore the migration from a bird’s eye view to understand what are the risks and opportunities. But first, let’s quickly revisit the timeline of how KRaft appeared on stage.
How we got here
Why migrate
As stated before, moving to KRaft is the only way forward if you want to further receive Kafka updates. This is regardless of how you deploy Kafka: Bare metal, VMs, custom Kubernetes manifests, Helm chart, Strimzi, Ansible, Confluent Platform, Confluent for Kubernetes, or others. Even when using a managed cloud platform like Red Hat Streams for Apache Kafka you have to migrate! Other cloud providers do not support migrating at all such as Amazon Managed Streaming for Apache Kafka or did not support ZooKeeper in the first place like Google Cloud Managed Service for Apache Kafka.
The migration requires a detailed analysis of the existing setup and careful preparation. The systems that interact with the Kafka cluster and the specific tech stack vary from organization to organization. Therefore, it is difficult to estimate how long the analysis and preparation for the migration will take.
In contrast, the migration steps are relatively simple and straightforward once the KRaft architecture is well understood. The time to execute these steps is on the order of hours, a couple of days at most. During this time, effective monitoring and observability tools are crucial to verify that the cluster is behaving correctly. Fortunately, the migration process offers a number of desirable properties if done correctly:
- Existing cluster data is preserved
- Zero cluster downtime is guaranteed
- Users of the Kafka cluster are not impacted
On the other hand, as stated in the release announcement, Kafka 3.9 offers the final and best iteration of the migration feature. Feature parity with ZooKeeper has been reached: Everything you can do with ZooKeeper, you can also do it with KRaft. Thus, it is highly recommended to first update to Kafka 3.9 and use this version as a “bridge” to the KRaft world. From this perspective, now is the best time to migrate: The process has been stabilized and tested thoroughly in production systems over the past year.
Moreover, KRaft introduces a new architecture which offers several advantages:
- Simplified Operations: KRaft eliminates the need to manage ZooKeeper alongside Kafka, making deployment and maintenance easier
- Faster Recovery: KRaft achieves recovery times which are 10x faster than with ZooKeeper, improving availability of the Kafka cluster
- Efficient Metadata Propagation: The KRaft model is tailor-made for Kafka and thus allows metadata to be propagated from the controller to the brokers more efficiently
- Simplified Security: With KRaft, there is no need to maintain a separate security configuration for ZooKeeper
- Support for Larger Partition Counts: Kafka in KRaft mode supports partition counts above 200,000
What happens during the migration
To explain the migration in simple terms, imagine you have a warehouse and you keep track of the inventory in a small book. This book records when the goods arrived and in which part of the warehouse they are located. It should be kept consistent as goods come in and out as it is of great help to find things. To avoid losing this valuable information, you hire three people and give each one a personal book. At any given time, one person is in charge and is authorized to write new lines to the book, the other two simply copy the book of the person in charge. If the person in charge cannot show up, the other two colleagues have an up-to-date copy and decide together who takes over. Now you want to switch from an old way of writing things down to a new tidier bookkeeping system.
This is a good analogy for the ZooKeeper to KRaft migration: You are not moving the contents of the warehouse (your actual data in Kafka topics), instead you are moving the contents of the book (the cluster metadata stored in ZooKeeper). The challenge is that you want to switch from one book to a new clean one without pausing your warehouse operations at any time. On top of that, if you lose part of the contents of the book, you may irreversibly lose the contents of the warehouse because you will no longer be able to find them.
To achieve this, the idea is to write new lines to the book in both the old and the new format until we are sure we can rely on the new bookkeeping system. This is called the dual-write mode and it is part of the KRaft migration. The steps are the following, assuming you have 3 bookkeepers/ZooKeeper instances:
Until step 8 included, you can undo the changes and rollback to the old setup. This allows you to test that your Kafka cluster is behaving correctly in dual-write mode and if so finalize the migration. You may notice that there is zero downtime from the perspective of the Kafka brokers. Until step 7 included, they simply think they are talking to a different ZooKeeper. During step 8, some brokers are already operating in KRaft mode while others are in ZooKeeper mode. Fortunately, the controller can handle both types of requests. After the brokers are migrated, the controllers can then switch to KRaft-only mode. By performing controlled rolling restarts of the brokers, it is guaranteed that there is zero downtime for the Kafka users (producers and consumers).
Why not migrate
In some rare cases, migration may not be possible. But since setting up a cluster with KRaft is unavoidable, the only alternative is to create a brand-new Kafka cluster from scratch. Then you need to transfer the existing data and users. It sounds easy on paper but in practice it is costly, complex and time-consuming. Consider for instance the following difficulties:
- The hardware resources (thus the operating costs) will double until the old Kafka cluster can be decommissioned.
- Users are directly impacted. They must update their application code to use the new Kafka cluster instead of the old.
- The migration may stretch over months or even years until every Kafka user is on board.
- You need to set up a mechanism such as MirrorMaker2 to keep the two clusters in sync during the entire migration phase.
In short, this alternative will probably require at least as much planning and analysis as the migration of the existing clusters. We recommend having a serious look at the migration before going for the alternative.
A last important caveat is that Confluent is planning to offer additional paid support for Confluent Platform 7.9, the last version with ZooKeeper included. This extension will last for two to three years (until 2027/2028) depending on your subscription tier.
A summary of the tradeoffs
Conclusion
The transition from ZooKeeper to KRaft is not just a technical milestone; it is a necessary step for ensuring the future of your Kafka ecosystem. With support for ZooKeeper ending in November 2025, delaying this migration could lead to operational risks, including outdated software and compatibility issues. Migrating your existing Kafka clusters to KRaft offers significant advantages, from simplified operations to enhanced scalability and reliability. While the process requires careful planning and execution, the benefits far outweigh the challenges. By starting your migration planning today, you can secure a seamless transition to KRaft, avoid last-minute surprises, and unlock the potential of Kafka’s next-generation architecture. Let us help you make this journey smooth and efficient — reach out to our team for expert guidance tailored to your unique needs.