Kafka “Diskless” – Proposals, Status & Insights
December 16, 2025 | By Varun Ghai
Introduction
Apache Kafka has long been the backbone of real-time data streaming, powering mission-critical pipelines across industries. Traditionally, Kafka has relied on local disk storage for durability and replication, which ensures high throughput and fault tolerance. However, this design comes with trade-offs—particularly in cloud environments where disk management, replication across availability zones, and scaling costs can become significant pain points.
The concept of diskless Kafka represents a paradigm shift. Instead of persisting data on broker disks, diskless implementations leverage object storage (such as Amazon S3) as the primary persistence layer. This approach aligns perfectly with cloud native principles: separation of compute and storage allowing elastic scaling while bringing about cost efficiency. By removing the dependency on local disks, organizations can dramatically reduce infrastructure costs and simplify operations. A major benefit is the reduction of inter-AZ data transfer costs, which are typically incurred during replication in traditional setups. Additionally, while less critical, eliminating the need for brokers with large local storage provides further savings by reducing hardware or instance costs and simplifying capacity planning.
However, this innovation introduces new challenges. Diskless architectures for Kafka must address latency trade-offs, ordering guarantees, and transactional semantics. The Kafka community is actively debating these issues through several Kafka Improvement Proposals (KIPs), most notably KIP-1150, KIP-1176, and KIP-1183. These proposals explore different paths toward diskless, each with unique benefits and limitations.
This page intends to serve as your go-to resource for tracking the happenings in this space and provide a brief, unbiased analysis of them. We will update this weekly.
KIPs in focus:
KIP-1150: Diskless Topics
Main Idea: Replace broker local disks with object storage (like S3) as the primary durable storage for Kafka topics.
Architecture Changes:
- Leaderless design – all brokers can interact with all partitions (revolutionary change from traditional leader-follower model)
- Data stored solely in object storage, not on broker disks
- Batch-based write model: producers send data to any broker → broker accumulates requests in buffer → uploads complete batches to object storage → Batch Coordinator assigns offsets
- Pluggable Batch Coordinator (KIP-1164) for metadata management
- Delegated replication to object storage (e.g., leveraging S3’s built-in cross-AZ replication)
- Full client API support (e.g., transaction, queue, tiered storage)
Implementation: Open source prototype at https://github.com/aiven/inkless (Aiven)
KIP-1176: Tiered Storage for Active Log Segment
Main Idea: Extend existing KIP-405 (tiered storage for closed segments) to also upload active log segments to fast object storage.
Architecture Changes:
- Incremental evolution of existing tiered storage – preserves leader-follower model
- Creates three-tier storage: Local disk → Fast object store (e.g., S3 Express One Zone) → Traditional object storage (e.g., S3)
- Leader writes to both local disk AND fast object storage simultaneously
- Followers fetch active segments from fast object storage (same AZ) instead of cross-AZ replication from leader
- Background tasks handle uploads via RLMWalCombinerTask
- Reuses existing page cache for performance
Implementation: Closed source (Slack/Salesforce)
KIP-1183: Unified Shared Storage
Main Idea: Abstract the storage layer to support both traditional local disk and shared storage simultaneously.
Architecture Changes:
- Two-step approach: 1) Abstract log layer with AbstractLog and AbstractLogSegment classes, 2) Define pluggable Stream API
- Enables Kafka to transform from shared-nothing to shared storage architecture
- Supports both architectures simultaneously for gradual migration
- Maintains leader-based architecture (unlike KIP-1150)
- Flexible deployment on various storage backends (e.g., S3, HDFS, NFS, Ceph, MinIO, CubeFS)
Implementation: Primary implementation at https://github.com/AutoMQ/automq (AutoMQ)
A table summary of the KIPs along several factors important to our customers, such as cost saving and design pros and cons, based on the community summary page, the KIPs and their discussion threads. Each factor we’ve evaluated on a scale of High/Medium/Low where possible based on existing data.
| KIPs | Cross-AZ Traffic Reduction | Storage Requirement | Cost Savings | Performance Impact | Scalability | Availability | Implementation Effort | Status (15 Dec 2025) |
| KIP-1150 | High - Eliminates all cross-AZ replication (leaderless design) | Low – Object storage only – Minimal, optional local disk for metadata and caching | High - No cross-AZ costs for replication | High – P50 ~500ms, P99 ~1-2s (vs single-digit ms traditional) | High – Stateless brokers, true cloud-native elasticity, data/metadata separation | High – Leverages cloud storage’s durability and built-in cross-AZ replication | High - Revolutionary change, multiple sub-KIPs, but clean design | Voting |
| KIP-1176 | Medium – Only follower replication path (producer→leader and consumer traffic unchanged) | Medium – Three-tier: Local disk + Fast object storage + traditional object storage | Medium - 43% overall cost reduction documented | Low – in some durability/storage settings - Maintains single-digit ms for acks=1, near-traditional for acks=-1 with fast storage | Medium – Still broker-centric, no cloud-native elasticity benefits | Low – AZ failure creates recovery challenges, no hot standby for active segments | Medium – Incremental changes to existing code, but complexity may grow | Under discussion, availability concerns raised |
| KIP-1183 | Medium - Eliminates cross-AZ replication with shared storage (RF=1) but still incurs producer and consumer-side traffic | Medium – Flexible, pluggable support for multiple standard or custom shared storage | Medium – Similar to KIP-1150 in zero broker-to-broker replication but less optimized on client-side traffic | Unknown - Depends on Stream implementation quality | High – Shared-storage-based brokers enable scaling, but can be limited by slow leader failover | Low – Failover latency with RF=1 (1-2 sec documented), no hot standby | High - Large plugin development burden, unclear Stream interface design | Under discussion, Stream interface design unclear |
KIP Change tracker – last reviewed 15 December 2025:
Note – Kafka dev mail discussions on the KIPs are excluded from the tracking below.
- KIP1150: Diskless Topics
- 16 Apr 2025: Initial page published.
- 23 Apr 2025: Added new subheading and content for “Disk usage and lack thereof”, explaining that diskless topics still will require some disk.
- 9 May 2025: Replaced sub-KIP “E” with KIP-1181, and added content for it in https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=357173709, introducing diskless topics and how client and brokers can use rack-awareness to minimize cross AZ costs
- 28 May 2025: Added cost information on cross-AZ data transfer charges for AWS, GCP, and Azure.
- 3 Sep 2025: Clarified that “diskless” doesn’t mean brokers do not need disks at all but that broker disks are not being used “as the primary durable storage of user data”. That “…the attached disks become a less important abstraction for operators but are still functionally present.”
- 12 Dec 2025: On pause since May 2025, voting resumed on the KIP.
- KIP1176: Tiered Storage for Active Log Segment
- 1 May 2025: Initial page published.
- 7 May 2025: Future enhancements for “Cloud native elasticity” added suggesting consumer reads could pull data directly from cloud storage instead of brokers, meaning brokers would need less memory and network throughput, leading to lighter clusters. Also suggested another future enhancement by remove the hot standby follower to further reduce resource requirements and latency.
- 10 May 2025: Added explanations for AZ availability and data durability (stating this would be the same as existing Kafka design). Cleaned up and added explanations and changes to the public interfaces section. Finally, added content in the rejected alternatives comparing to KIP-1150, as well as KIP-405 (on tiered storage).
- 31 May 2025: Carried out some minor cleanup and introduced metadata topic __remote_wal_log_metadata as an internal topic to store metadata about WAL (Walk-Ahead Log) segments that are offloaded to remote storage.
- 4 Jun 2025: Clarified that the proposal would work just as well with traditional cloud storage such as S3 and not just S3 Express One Zone. Added two new public interfaces.
- 29 Aug 2025: Added explanations on costs incurred and saved with the proposal. Added performance test setup and results info in the appendix, along with an analysis on comparison with KIP-1176 performance results.
- 10 Oct 2025: Laid out leader failover strategy explaining how recovery would be possible, and the result would be equivalent to how it is for “standard” Kafka.
- KIP1183: Unified Shared Storage
- 13 May 2025: Initial page published.
- 16 May 2025: Minor cleanup.