Kafka “Diskless” – Proposals, Status & Insights

December 16, 2025 | By Varun Ghai

Introduction 

Apache Kafka has long been the backbone of real-time data streaming, powering mission-critical pipelines across industries. Traditionally, Kafka has relied on local disk storage for durability and replication, which ensures high throughput and fault tolerance. However, this design comes with trade-offs—particularly in cloud environments where disk management, replication across availability zones, and scaling costs can become significant pain points. 

The concept of diskless Kafka represents a paradigm shift. Instead of persisting data on broker disks, diskless implementations leverage object storage (such as Amazon S3) as the primary persistence layer. This approach aligns perfectly with cloud native principles: separation of compute and storage allowing elastic scaling while bringing about cost efficiency. By removing the dependency on local disks, organizations can dramatically reduce infrastructure costs and simplify operations. A major benefit is the reduction of inter-AZ data transfer costs, which are typically incurred during replication in traditional setups. Additionally, while less critical, eliminating the need for brokers with large local storage provides further savings by reducing hardware or instance costs and simplifying capacity planning. 

However, this innovation introduces new challenges. Diskless architectures for Kafka must address latency trade-offs, ordering guarantees, and transactional semantics. The Kafka community is actively debating these issues through several Kafka Improvement Proposals (KIPs), most notably KIP-1150, KIP-1176, and KIP-1183. These proposals explore different paths toward diskless, each with unique benefits and limitations. 

This page intends to serve as your go-to resource for tracking the happenings in this space and provide a brief, unbiased analysis of them. We will update this weekly.

KIPs in focus: 

KIP-1150: Diskless Topics 

Main Idea: Replace broker local disks with object storage (like S3) as the primary durable storage for Kafka topics. 

Architecture Changes: 

  • Leaderless design – all brokers can interact with all partitions (revolutionary change from traditional leader-follower model) 
  • Data stored solely in object storage, not on broker disks 
  • Batch-based write model: producers send data to any broker → broker accumulates requests in buffer → uploads complete batches to object storage → Batch Coordinator assigns offsets 
  • Pluggable Batch Coordinator (KIP-1164) for metadata management 
  • Delegated replication to object storage (e.g., leveraging S3’s built-in cross-AZ replication) 
  • Full client API support (e.g., transaction, queue, tiered storage) 

Implementation: Open source prototype at https://github.com/aiven/inkless (Aiven) 

  

KIP-1176: Tiered Storage for Active Log Segment 

Main Idea: Extend existing KIP-405 (tiered storage for closed segments) to also upload active log segments to fast object storage. 

Architecture Changes: 

  • Incremental evolution of existing tiered storage – preserves leader-follower model 
  • Creates three-tier storage: Local disk → Fast object store (e.g., S3 Express One Zone) → Traditional object storage (e.g., S3) 
  • Leader writes to both local disk AND fast object storage simultaneously 
  • Followers fetch active segments from fast object storage (same AZ) instead of cross-AZ replication from leader 
  • Background tasks handle uploads via RLMWalCombinerTask 
  • Reuses existing page cache for performance 

Implementation: Closed source (Slack/Salesforce) 

  

KIP-1183: Unified Shared Storage 

Main Idea: Abstract the storage layer to support both traditional local disk and shared storage simultaneously. 

Architecture Changes: 

  • Two-step approach: 1) Abstract log layer with AbstractLog and AbstractLogSegment classes, 2) Define pluggable Stream API 
  • Enables Kafka to transform from shared-nothing to shared storage architecture 
  • Supports both architectures simultaneously for gradual migration 
  • Maintains leader-based architecture (unlike KIP-1150) 
  • Flexible deployment on various storage backends (e.g., S3, HDFS, NFS, Ceph, MinIO, CubeFS) 

Implementation: Primary implementation at https://github.com/AutoMQ/automq (AutoMQ)

 

A table summary of the KIPs along several factors important to our customers, such as cost saving and design pros and cons, based on the community summary page, the KIPs and their discussion threads. Each factor we’ve evaluated on a scale of High/Medium/Low where possible based on existing data. 

KIPs  Cross-AZ Traffic Reduction  Storage Requirement  Cost Savings  Performance Impact  Scalability  Availability  Implementation Effort  Status (15 Dec 2025) 
KIP-1150  High - Eliminates all cross-AZ replication (leaderless design)  Low – Object storage only – Minimal, optional local disk for metadata and caching  High - No cross-AZ costs for replication  High – P50 ~500ms, P99 ~1-2s (vs single-digit ms traditional)  High – Stateless brokers, true cloud-native elasticity, data/metadata separation  High – Leverages cloud storage’s durability and built-in cross-AZ replication  High - Revolutionary change, multiple sub-KIPs, but clean design  Voting 
KIP-1176  Medium – Only follower replication path (producer→leader and consumer traffic unchanged)  Medium – Three-tier: Local disk + Fast object storage + traditional object storage  Medium - 43% overall cost reduction documented  Low – in some durability/storage settings - Maintains single-digit ms for acks=1, near-traditional for acks=-1 with fast storage  Medium – Still broker-centric, no cloud-native elasticity benefits  Low – AZ failure creates recovery challenges, no hot standby for active segments  Medium – Incremental changes to existing code, but complexity may grow  Under discussion, availability concerns raised 
KIP-1183  Medium - Eliminates cross-AZ replication with shared storage (RF=1) but still incurs producer and consumer-side traffic Medium – Flexible, pluggable support for multiple standard or custom shared storage  Medium – Similar to KIP-1150 in zero broker-to-broker replication but less optimized on client-side traffic Unknown - Depends on Stream implementation quality  High – Shared-storage-based brokers enable scaling, but can be limited by slow leader failover  Low – Failover latency with RF=1 (1-2 sec documented), no hot standby  High - Large plugin development burden, unclear Stream interface design  Under discussion, Stream interface design unclear 

KIP Change tracker – last reviewed 15 December 2025: 

Note – Kafka dev mail discussions on the KIPs are excluded from the tracking below. 

  • KIP1150: Diskless Topics 
    • 16 Apr 2025: Initial page published. 
    • 23 Apr 2025: Added new subheading and content for “Disk usage and lack thereof”, explaining that diskless topics still will require some disk. 
    • 9 May 2025: Replaced sub-KIP “E” with KIP-1181, and added content for it in https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=357173709, introducing diskless topics and how client and brokers can use rack-awareness to minimize cross AZ costs 
    • 28 May 2025: Added cost information on cross-AZ data transfer charges for AWS, GCP, and Azure. 
    • 3 Sep 2025: Clarified that “diskless” doesn’t mean brokers do not need disks at all but that broker disks are not being used “as the primary durable storage of user data”. That “…the attached disks become a less important abstraction for operators but are still functionally present.” 
    • 12 Dec 2025: On pause since May 2025, voting resumed on the KIP. 
  • KIP1176: Tiered Storage for Active Log Segment 
    • 1 May 2025: Initial page published. 
    • 7 May 2025: Future enhancements for “Cloud native elasticity” added suggesting consumer reads could pull data directly from cloud storage instead of brokers, meaning brokers would need less memory and network throughput, leading to lighter clusters. Also suggested another future enhancement by remove the hot standby follower to further reduce resource requirements and latency. 
    • 10 May 2025: Added explanations for AZ availability and data durability (stating this would be the same as existing Kafka design). Cleaned up and added explanations and changes to the public interfaces section. Finally, added content in the rejected alternatives comparing to KIP-1150, as well as KIP-405 (on tiered storage). 
    • 31 May 2025: Carried out some minor cleanup and introduced metadata topic __remote_wal_log_metadata as an internal topic to store metadata about WAL (Walk-Ahead Log) segments that are offloaded to remote storage. 
    • 4 Jun 2025: Clarified that the proposal would work just as well with traditional cloud storage such as S3 and not just S3 Express One Zone. Added two new public interfaces. 
    • 29 Aug 2025: Added explanations on costs incurred and saved with the proposal. Added performance test setup and results info in the appendix, along with an analysis on comparison with KIP-1176 performance results. 
    • 10 Oct 2025: Laid out leader failover strategy explaining how recovery would be possible, and the result would be equivalent to how it is for “standard” Kafka.