Apache Kafka® 4.1.0: Advancing Queues and rack-aware rebalancing returns

What is new in the latest Kafka release? In my previous blog, I introduced the new Streams Rebalance Protocol. Now, I’m diving into more exciting Apache Kafka 4.1.X features. We welcome back rack-aware partition rebalancing (KIP-1101/17747) and explore how “Queues for Kafka” (KIP-932) advances from early access to preview.

If you want to optimize your broker load and ensure high availability, understanding these Apache Kafka 4.1.X features is essential for your streaming architecture.

Rack-aware partition rebalance returns!

AI-generated image of data racks and cables

Balancing racks can be tricky! (AI generated—Cursor)

Welcome back to rack-aware partition rebalancing in the next-generation consumer rebalance protocol! This was removed in 4.0 (due to performance concerns), but it’s back in 4.1.0 in “KIP-1101: Trigger rebalance on rack topology changes”, with significant performance improvements. See more from the KIP! It is also referred to as KAFKA-17747 in the release notes (i.e. you won’t see KIP-1101 mentioned).

You can read more about the next-generation consumer rebalance protocol in these blogs:

But what is rack-aware partition assignment and rebalancing anyway?

Rack‑aware partition assignment is Kafka’s way of making sure that the replicas of each partition are spread across different physical locations (called racks, or availability zones) so that the system can survive the loss of an entire rack.

Managed production Kafka clusters often span multiple racks or availability zones. If all replicas of a partition were placed in the same rack and that rack failed (power outage, network failure, or hardware failure), the partition would become unavailable. Rack‑aware assignment prevents this by ensuring replicas are placed in different racks.

But what is a “rack topology” in Kafka?

The rack topology in Kafka is the full layout of which brokers are assigned to which racks, and how replicas of each partition are distributed across those racks. As noted above, this topology determines how Kafka spreads replica copies so that a single rack failure doesn’t cause data loss or partition unavailability. Because Kafka replicates each partition across multiple brokers, the rack topology ensures:

Replicas are placed in different racks so they survive rack‑level failures
Broker load and availability are balanced across the cluster
Consumers or Streams apps can read from the closest replica (when using rack‑aware fetch)

This last point is important—rack topology is essential if you want “follower fetching” (KIP-392: Allow consumers to fetch from closest replica) to pick the closest replica in the same AZ as the consumer, which is critical for avoiding expensive cross-AZ traffic.

So, what can cause a rack topology change, and therefore trigger a rebalance?

Adding partitions to a topic
Consumers joining or leaving the group, and
Changes to the rack assignment of any partition, including replicas moving to different racks.

Finally, what do you have to do to enable KIP-1101 in Kafka 4.1.0? Nothing!

KIP-1101 is active by default in Kafka 4.1.0 once the cluster is running the new Next-Generation Group Coordinator (KIP-848). i.e. KIP-1101 is enabled by default in Kafka 4.1.0 once the cluster is running the new Next-Generation Group Coordinator (KIP-848).

If you want KIP‑1101 to take effect, ensure that:

All brokers are upgraded to Kafka 4.1.0 or later (KIP‑1101 is only implemented in 4.1.0).
Your consumer groups use the new protocol (group.protocol=consumer). This is the next‑generation consumer rebalance protocol introduced by KIP‑848.
Brokers and partitions have rack metadata set (broker.rack). KIP‑1101 triggers only when rack topology actually changes—which requires racks to be defined.

For NetApp Instaclustr managed Apache Kafka clusters, please reach out to our support team to enable this for your clusters.

To actually use KIP-1101 for rack aware use cases, you need to check the following configurations.

On the server side:

group.version=consumer
group.consumer.assignors=com.example.kafka.RackAwareAssignor,\

org.apache.kafka.coordinator.assignors.uniform.UniformAssignor

group.version=consumer

group.consumer.assignors=com.example.kafka.RackAwareAssignor,\

org.apache.kafka.coordinator.assignors.uniform.UniformAssignor

And on the consumer side:

group.protocol=consumer
group.remote.assignor=com.example.kafka.RackAwareAssignor
group.id=my-app

group.protocol=consumer

group.remote.assignor=com.example.kafka.RackAwareAssignor

group.id=my-app

But watch out! There’s no default RackAwareAssignor built-in; you have to provide one. Here’s an example:

package com.example.kafka; 

import org.apache.kafka.coordinator.group.assignor.ConsumerGroupPartitionAssignor; 
import org.apache.kafka.coordinator.group.assignor.Assignment; 
import org.apache.kafka.coordinator.group.assignor.AssignmentContext; 
import java.util.*; 

public class RackAwareAssignor implements ConsumerGroupPartitionAssignor { 

    @Override 
    public String name() { 
        return "rack-aware"; 
    } 

    @Override 
    public Assignment assign(AssignmentContext context) { 

        // Map: consumerID -> assigned partitions 
        Map<String, List<Assignment.TopicPartition>> result = new HashMap<>(); 

        // Initialize empty lists 
        for (String member : context.members()) { 
            result.put(member, new ArrayList<>()); 
        } 

        // Build reverse lookup: consumer -> rack 
        Map<String, String> consumerRacks = new HashMap<>(); 
        for (String member : context.members()) { 
            String rack = context.memberMetadata(member).rackId(); 
            consumerRacks.put(member, rack); 
        } 

        // For each topic-partition, choose a consumer 

        for (Assignment.TopicPartition tp : context.partitions()) { 

            // Get partition rack 
            String partitionRack = context.partitionMetadata(tp).rack(); 

            // Try to find a consumer whose rack matches 
            Optional<String> localConsumer = consumerRacks.entrySet().stream() 
                .filter(e -> Objects.equals(e.getValue(), partitionRack)) 
                .map(Map.Entry::getKey) 
                .findAny(); 

            String chosenConsumer; 

            if (localConsumer.isPresent()) { 
                chosenConsumer = localConsumer.get();   // rack-local 
            } else { 
                // Fallback: choose least-loaded consumer 
                chosenConsumer = result.entrySet().stream() 
                    .min(Comparator.comparingInt(e -> e.getValue().size())) 
                    .get() 
                    .getKey(); 
            } 

            result.get(chosenConsumer).add(tp); 
        } 

        // Convert to final Assignment object 
        return Assignment.of(result); 
    } 
}

package com.example.kafka;

import org.apache.kafka.coordinator.group.assignor.ConsumerGroupPartitionAssignor;

import org.apache.kafka.coordinator.group.assignor.Assignment;

import org.apache.kafka.coordinator.group.assignor.AssignmentContext;

import java.util.*;

public class RackAwareAssignor implements ConsumerGroupPartitionAssignor {

    @Override

    public String name() {

        return "rack-aware";

    }

    @Override

    public Assignment assign(AssignmentContext context) {

        // Map: consumerID -> assigned partitions

        Map<String, List<Assignment.TopicPartition>> result = new HashMap<>();

        // Initialize empty lists

        for (String member : context.members()) {

            result.put(member, new ArrayList<>());

        }

        // Build reverse lookup: consumer -> rack

        Map<String, String> consumerRacks = new HashMap<>();

        for (String member : context.members()) {

            String rack = context.memberMetadata(member).rackId();

            consumerRacks.put(member, rack);

        }

        // For each topic-partition, choose a consumer

        for (Assignment.TopicPartition tp : context.partitions()) {

            // Get partition rack

            String partitionRack = context.partitionMetadata(tp).rack();

            // Try to find a consumer whose rack matches

            Optional<String> localConsumer = consumerRacks.entrySet().stream()

                .filter(e -> Objects.equals(e.getValue(), partitionRack))

                .map(Map.Entry::getKey)

                .findAny();

            String chosenConsumer;

            if (localConsumer.isPresent()) {

                chosenConsumer = localConsumer.get();   // rack-local

            } else {

                // Fallback: choose least-loaded consumer

                chosenConsumer = result.entrySet().stream()

                    .min(Comparator.comparingInt(e -> e.getValue().size()))

                    .get()

                    .getKey();

            }

            result.get(chosenConsumer).add(tp);

        }

        // Convert to final Assignment object

        return Assignment.of(result);

    }

This example uses the server-side assignor API (ConsumerGroupPartitionAssignor).

It demonstrates:

Reading rack information for each topic partition
Grouping consumer members by the rack they run on
Assigning rack-local partitions to rack-local consumers
Falling back when no local consumer exists
Producing the proper assignment map

Finally, who can resist a good simulation (or 3)? I asked “Cursor” to write a Rack Aware Assignor example, and it suggested running some simulations to test it out, which sounded cool. It simulated baseline results (no rack awareness), imperfect rack metadata (with rack awareness), and ideal (with rack awareness). The results with 3 racks, and an even distribution of consumers and partition leaders across the 3 racks (100% hit rate is ideal) are:

Locality hit rate (baseline): 33.33% (1 in 3 chance of randomly landing on the same rack as the leader).
Imperfect metadata hit rate (rack-aware): 80%
Perfect metadata hit rate (rack-aware): 100%

The 80% result is probably more representative of real clusters, where you usually see a lower than 100% hit rate because of:

missing rack metadata
uneven consumer placement across racks
limited consumers for some topics
rebalancing constraints/stickiness

Here’s the code for the assignor and the simulation.

Queues for Kafka (KIP-932) is in preview

Queues for Kafka (KIP-932) has advanced (in the queue?!) from early access to preview. It’s ready for evaluation and testing, but not for production in this version. But watch out! If you enabled it and started evaluating it in 4.0, the early access support for KIP-932 in Kafka 4.0 is NOT COMPATIBLE with the 4.1.0 preview, so you cannot upgrade to 4.1.0. In particular, only 4.1 Kafka clients can be used with Kafka 4.1 share groups. Find out more here and here.

Find out more about Queues for Kafka in these previous blogs:

Conclusion

In this and the previous blog, we’ve seen that Apache Kafka 4.1.0 introduces the new “Streams Rebalance Protocol” (KIP-1071, with a new streams group, in early access), welcomes back rack-aware partition rebalancing (KIP-1101/17747: Trigger rebalance on rack topology changes), and “Kafka Queues” (KIP-932) advances in the queue (from early access to preview).

Try any of these features out by starting a free 30-day trial of our Managed Apache Kafka platform today. We currently offer Kafka 4.1.2 (a bug fix version) as the Generally Available version of 4.1.X on our platform. Optimize your data streaming architecture and see the performance benefits firsthand.

Apache Kafka® 4.1.0: Advancing Queues and rack-aware rebalancing returns

Rack-aware partition rebalance returns!

Queues for Kafka (KIP-932) is in preview

Conclusion

About the author

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter