Canberra Innovation Network 1 Moore Street #5 Canberra, ACT 2601 Wednesday 15th May 2019

Canberra Big Data Meetup – High throughput ingestion of time series data with OpenTSDB and HBase

Register Now

During this Meetup, we will have two talks

First Talk: Ingesting 12 Million Data Points per Hour with OpenTSDB and HBase

Abstract:At Reposit Power we ingest large amounts of time-series energy monitoring data every hour. In this talk we will learn what time-series data is and what properties make it unique from standard relational data that people are often more familiar with. We will talk about how this can be stored taking advantage of the wide-column storage of HBase and why this is well suited to time-series data. And finally we will discuss techniques on querying such a large volume of data.

Bio: Mike is a full stack developer with over 10 years of experience. He has spent most of his career working for the public sector in the UK and Australia and is now the Head of Software Engineering at Reposit Power using Python for just about everything. He is passionate experimenting and learning with new technology as it evolves and dealing with Big Data.

Second Talk: Storing and Using Metrics for 3000 nodes – How Instaclustr use a Time Series Cassandra Data Model to store 1 million metrics a minute

Abstract: At Instaclustr, we use one of our own cassaandra clusters and a custom monitoring application to store, process, and retrieve metrics on every single node in our fleet. The talk will introduce how we collect, process, store, and rollup all the metrics which pass through our Monitoring System every Second. We will discuss why using Cassandra, time buckets, and Spark is really suited to efficiently and quickly store, and query all that monitoring data.

Bio: Jordan is a Senior Software Engineer, who has been with Instaclustr for over 2 years. He is experienced with our Internal Metrics and Monitoring System, and has done internal benchmarking multiple open source technologies. Focused on outcomes, he is excited to find solutions to any real world big data problems.