What is Apache Kafka®?
Apache Kafka® is the leading distributed streaming and queuing technology for large-scale, always-on applications. Kafka has built-in features of horizontal scalability, high-throughput and low-latency. It is highly reliable, has high-availability, and allows geographically distributed data streams and stream processing applications.
Donated to the Apache Foundation by LinkedIn in 2011, Kafka has garnered lot of interest and is now being broadly used by many organisations across the globe including Netflix, Twitter, Spotify and Uber. Kafka has grown to a strong and vibrant open community and it is compatible with a wide range of complementary technology.
Kafka has a similar shared-nothing, replicated architecture to Cassandra that allows it to operate with similar extreme levels of scalability, reliability and availability. In any big- data application, Kafka really has three core functions:
Message Transport | Message Aggregation | Message Store |
---|---|---|
Enabling transportation of data between various publisher and subscriber endpoints. | Aggregating a number of various data streams for use by distributed processing applications. | Storing data streams as a cache in a replicated, fault-tolerant storage environment. |