Real-Time Anomaly Detection on 19 Billion Events a Day


This presentation was delivered by Paul Brebner, Technology Evangelist, Instaclustr at various meetup groups.

Apache Kafka, Apache Cassandra, and Kubernetes are open source big data technologies, enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpin the data layer of the stack providing the capability to stream, disseminate, store, and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters.

In this presentation, Paul reveals how he architected a massive-scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example anomaly detection application running on a Kubernetes cluster and generating and processing a massive amount of events.

Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at a massive scale generating millions or billions of events, they impose significant computational, performance, and scalability challenges to anomaly detection algorithms and data layer technologies.

Paul demonstrates the scalability, performance, and cost-effectiveness of Apache Kafka, Cassandra, and Kubernetes, with results from his experiments allowing the anomaly detection application to scale to 19 Billion anomaly checks per day.