As part of Melbourne Distributed Meetup Group, we are hosting this presentation will explore how we added location data to a scalable real-time anomaly detection application, built around Apache Kafka, and Cassandra.
Kafka and Cassandra are designed for time-series data, however, it’s not so obvious how they can process geospatial data. In order to find location-specific anomalies, we need a way to represent locations, index locations, and query locations.
We explore alternative geospatial representations including: Latitude/Longitude points, Bounding Boxes, Geohashes, and go vertical with 3D representations, including 3D Geohashes.
For each representation we also explore possible Cassandra implementations including: Clustering columns, Secondary indexes, Denormalized tables, and the Cassandra Lucene Index Plugin.
To conclude we measure and compare the query throughput of some of the solutions, and summarise the results in terms of accuracy vs. performance to answer the question “Which geospatial data representation and Cassandra implementation is best?
Presenter – Paul Brebner, Technology Evangelist at Instaclustr. Paul has been learning new scalable technologies, solving realistic problems and building applications, and blogging about Apache Cassandra, Spark, and Kafka. Paul has extensive R&D and consulting experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning. Paul has an MSc in Machine Learning and a BSc (Computer Science and Philosophy).