Paul Brebner

Technology Evangelist

Since learning to program on a VAX 11/780, Paul has extensive R&D and consulting experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning.

Paul is the Technology Evangelist at Instaclustr. He’s been learning new scalable technologies, solving realistic problems, and building applications, and blogging about Apache Cassandra, Spark, Zeppelin, and Kafka.

Paul has worked at UNSW, several tech start-ups, CSIRO, UCL (UK), and NICTA. Paul has helped pre-empt and solve significant software architecture and performance problems for clients including Defence and NBN Co. Paul has an MSc in Machine Learning and a BSc (Computer Science and Philosophy).

Research net profile

Paul Brebner

Paul's Articles

It’s an In-Memory Key-Value Store! It’s a Database! It’s Redis!

Wednesday 9th September 2020

Look! Up in the sky! It’s an in-memory key-value store! It’s a database! It’s Redis! Faster than a speeding database! More powerful than an in-memory key-value store!  Able to leap tall performance barriers at a single bound! “Yes, it’s Redis—strange visitor from another planet who came to Earth with powers and abilities far beyond those […]

Read more

Taking Elasticsearch for a Spin around the Race Track (Q&A): Part 3

Tuesday 14th July 2020

Then may I set the world on wheels, when she can spin for her living. (Two Gentlemen of Verona, III, 1) The weary sun hath made a golden set, And by the bright track of his fiery car, Gives signal, of a goodly day to-morrow.  (Richard III, V, 3) Thy burning car never had scorch’d […]

Read more

Taking Elasticsearch to the Mechanics: Under the Hood Q&A (Part 2)

Wednesday 8th July 2020

Marry, that’s a bountiful answer that fits all questions (All’s Well That Ends Well, II, 2) In Part 1 of this multi-part Elasticsearch Blog I revealed the most interesting things I learnt after taking Elasticsearch for my first “Test Drive”, including that Elasticsearch comes well equipped with some clever-sounding computational linguistics analysis tricks including Stemming, […]

Read more

Building a Low-Latency Distributed Stock Broker Application: Part 3

Friday 24th April 2020

In the third blog of the  “Around the World ” series focusing on globally distributed storage, streaming, and search, we build a Stock Broker Application.  1. Place Your Bets! The London Stock Exchange  How did Phileas Fogg make his fortune? Around the World in Eighty Days describes Phileas Fogg in this way:  “Was Phileas Fogg […]

Read more

An Introduction to Cassandra Multi-Data Centers: Part 2

Friday 3rd April 2020

In this second blog of  “Around the World in (Approximately) 8 Data Centers” series we catch our first mode of transportation (Cassandra) and explore how it works to get us started on our journey to multiple destinations (Data Centers). 1. What Is a (Cassandra) Data Center? What does a Data Center (DC) look like? Here […]

Read more

An Introduction to Cassandra Multi-Data Centers: Part 1

Monday 9th March 2020

Quick! Grab your top hat, passport, carpetbag stuffed with (mainly) cash, and your valet (if you have one), and join with me on a wild journey around the world in approximately 8 data centers—a new blog series to explore the world of globally distributed storage, streaming, and search with Instaclustr Managed Open Source Technologies such […]

Read more

The Power of Kafka Partitions : How to Get the Most out of Your Kafka Cluster

Monday 6th January 2020

This blog provides an overview around the two fundamental concepts in Apache Kafka : Topics and Partitions. While developing and scaling our Anomalia Machina application we have discovered that distributed applications using Kafka and Cassandra clusters require careful tuning to achieve close to linear scalability, and critical variables included the number of Kafka topics and […]

Read more

Cassandra Elastic Auto-Scaling Using Instaclustr’s Dynamic Cluster Resizing

Tuesday 3rd December 2019

This is the third and final part of a mini-series looking at the Instaclustr Provisioning API, including the new Open Service Broker.  In the last blog we demonstrated a complete end to end example using the Instaclustr Provisioning API, which included dynamic Cassandra cluster resizing.  This blog picks up where we left off and explores […]

Read more

ApacheCon Berlin, 22-24 October 2019

Monday 2nd December 2019

ApacheCon Europe, October 22-24, 2019, Kulturbrauerei Berlin #ACEU19 https://aceu19.apachecon.com/ What’s better than one ApacheCon? Another ApacheCon! This year there were two Apache Conferences, one in Las Vegas and then again in Berlin. They were similar but different. What were some differences between ApacheCon Berlin and Las Vegas? The location. In contrast to the hyper-real gambling […]

Read more

Instaclustr Provisioning API Demonstration: A Complete End-to-End Example

Monday 23rd September 2019

Overview An end-to-end demonstration of Instaclustr’s Provisioning API for any use case involving automated programmatic cluster provisioning, configuration, discovery, and de-provisioning (or a subset of these operations). 1. Provisioning Provisioning: Supply with food, drink, or equipment, especially for a journey. Provisioning is all about ensuring you have sufficient quantity of provisions (food, drink, etc.) sufficiently […]

Read more

Instaclustr Open Service Broker – A Complete End-to-End Example

Wednesday 11th September 2019

Introduction Instaclustr has recently launched the Instaclustr Service Broker, an implementation of the Open Service Broker (OSB) API for Instaclustr managed services (Apache Cassandra, Spark, Zeppelin, and Kafka).   Over a series of blogs I plan to try it out using the following “bottom-up” approach:  get a complete end-to-end Kubernetes workflow working to test and demonstrate […]

Read more

Geospatial Anomaly Detection (Terra-Locus Anomalia Machina) Part 2: Geohashes (2D)

Tuesday 18th June 2019

Massively Scalable Geospatial Anomaly Detection with Apache Kafka and Cassandra In this blog, we continue exploring how to build a scalable Geospatial Anomaly Detector. In the previous blog, we introduced the problem and tried an initial Cassandra data model with locations based on latitude and longitude. We now try another approach, Geohashes, to start with, […]

Read more

Anomalia Machina 8 – Production Application Deployment with Kubernetes

Tuesday 5th March 2019

In the previous blog we explored deploying the Anomalia Machina application on Kubernetes with the help of AWS EKS. In the recent blogs (Anomalia Machina 5 and Anomalia Machina 6), we enhanced the observability of the Anomalia Machina Application using two Open Source technologies: Prometheus for distributed monitoring of metrics such as throughput and latency; […]

Read more

Anomalia Machina 7 – Kubernetes Cluster Creation and Application Deployment

Monday 11th February 2019

Kubernetes – Greek: κυβερνήτης = Helmsman If you are Greek hero about to embark on an epic aquatic quest (encountering one eyed rock throwing monsters, unpleasant weather, a detour to the underworld, tempting sirens, angry gods, etc) then having a trusty helmsman is mandatory (Even though the helmsman survived the Cyclops, like all of Odysseus’s […]

Read more

Anomalia Machina 6 – Application Tracing with OpenTracing: Massively Scalable Anomaly Detection with Apache Kafka and Cassandra

Tuesday 15th January 2019

In the previous blog (Anomalia Machina 5 – Application Monitoring with Prometheus) we explored how to better understand an Open Source system using Prometheus for distributed metrics monitoring. In this blog we have a look at another way of increasing visibility into a system using OpenTracing for distributed tracing. 1 A history of Tracing Over […]

Read more

Anomalia Machina 5 – Application Monitoring with Prometheus: Massively Scalable Anomaly Detection with Apache Kafka and Cassandra

Wednesday 19th December 2018

1 Introduction In order to scale Anomalia Machina we plan to run the application (load generator and detector pipeline) on multiple EC2 instances. We are working on using Kubernetes (AWS EKS) to automate this, and progress so far is described in this webinar. However, before we can easily run a Kubernetes deployed application at scale […]

Read more

Anomalia Machina 1 – Massively Scalable Anomaly Detection with Apache Kafka and Cassandra

Friday 28th September 2018

anomalia – Latin (1) irregularity, anomaly machina – Latin (1) machine, tool, (2) scheme, plan, machination What do you get if you combine Anomalia and Machina? Machine Anomaly – A broken machine (Machina Anomalia) Irregular Machinations – Too political (Anomalia Machina, 2nd definition) Anomaly Machine! (Anomalia Machina, 1st definition) Let’s build the Anomalia Machina! A […]

Read more

Apache Kafka “Kongo” 6.2 – Production Kongo on Instaclustr

Friday 29th June 2018

In this blog (parts 6.1 and 6.2) we deploy the Kongo IoT application to a production Kafka cluster, using Instraclustr’s Managed Apache Kafka service on AWS.  In part 6.1 we explored Kafka cluster creation and how to deploy the Kongo code. Then we revisited the design choices made previously regarding how to best handle the […]

Read more

Apache Kafka “Kongo” 6.1 – Production Kongo on Instaclustr

Friday 29th June 2018

In this blog we deploy the Kongo IoT application to a production Kafka cluster, using Instraclustr’s Managed Apache Kafka service on AWS. We explore Kafka cluster creation and how to deploy the Kongo code. Then we revisit the design choices made previously regarding how to best handle the high consumer fan out of the Kongo […]

Read more

Apache Kafka “Kongo” 5.3: Kongo Streams Example

Wednesday 20th June 2018

Introduction In the previous blog we tried a simple Kafka Streams application for Cluedo. It relied on a KTable to count the number of people in each room. In this blog, we’ll extend this idea and develop a more complex streams application to keep track of the weight of goods in trucks for our Kongo […]

Read more

Kongo 5.2: Apache Kafka Streams Examples

Tuesday 29th May 2018

Dr Black has been murdered in the Billiard Room with a Candlestick! Whodunnit?! In this blog, we’ll have a look at some simple Kafka Streams examples using the murder mystery game Cluedo (Clue in the US) as a simple problem domain.  There are six suspects and a mansion with multiple rooms. The suspects are: Miss […]

Read more

Kongo 5.1: Apache Kafka Streams Introduction

Tuesday 29th May 2018

Abstract Apache Kafka Streams is a framework for stream data processing. In this blog, we’ll introduce Kafka Streams concepts and take a look at one of the DSL operations, Joins, in more detail. In the next blog, we’ll have a look at some more complete Kafka Streams examples based on the murder mystery game Cluedo. […]

Read more

Apache Kafka Connect Architecture Overview

Wednesday 9th May 2018

Kafka Connect is an API and ecosystem of 3rd party connectors that enables Apache Kafka to be scalable, reliable, and easily integrated with other heterogeneous systems (such as Cassandra, Spark, and Elassandra) without having to write any extra code. This blog is an overview of Kafka Connect Architecture with a focus on the main Kafka […]

Read more

“Kongo” Part 3 – Apache Kafka: Kafkafying Kongo – Serialization, One or Many topics, Event Order Matters

Thursday 26th April 2018

Kafkafying – the transformation of a primitive monolithic program into a sophisticated scalable low-latency distributed streaming application (c.f. “An epidemic of a zombifying virus ravaged the country”) Steps for Kafkafying Kongo In the previous blog (“Kongo” Part 2: Exploring Apache Kafka application architecture: Event Types and Loose Coupling)  we made a few changes to the […]

Read more

“Kongo” Part 2: Exploring Apache Kafka application architecture: Event Types and Loose Coupling

Thursday 5th April 2018

This is the second post in our series exploring designing and developing and example IOT application with Apache Kafka to illustrate typical design and implementation considerations and patterns. In the previous blog, we introduced our Instaclustr “Kongo” IoT Logistics Streaming Demo Application. The code for Version 1 of the Kongo application was designed as an initial […]

Read more

“Kongo” Part 1 – Apache Kafka: IoT Logistics Streaming Demo Application

Thursday 15th March 2018

What’s a good name to give a demo IoT streaming application dealing with large scale logistics? How about a river… Maybe The “Amazon” application? That’s sort of taken. The Amazon is the longest river and has the most water flow, but what’s the 2nd ranking river? The Congo! The Congo is the 2nd biggest river […]

Read more

Exploring the Apache Kafka “Castle” Part B: Event Reprocessing

Thursday 18th January 2018

In this second part of the Apache Kafka Castle blog we contemplate the being or not being of Kafka Event Reprocessing, and speeding up time! Reprocessing Use Cases Reprocess: /riːˈprəʊsɛs/ verb Process (something, especially spent nuclear fuel) again or differently. Repeat event processing is called reprocessing (or sometimes replaying or rewinding), and some reprocessing use […]

Read more

Exploring the Apache Kafka “Castle” Part A: Architecture and Semantics

Friday 12th January 2018

NEWS FLASH Apache Kafka coming soon to Instaclustr’s service offering! If you haven’t read Kafka’s “The Castle” (I haven’t) a few online observations are sufficient for a concise summary (and will save you the trouble of reading it): Time seems to have stopped in the village The story has no ending (Kafka died before completing […]

Read more

Pick‘n’Mix: Cassandra, Spark, Zeppelin, Elassandra, Kibana, & Kafka

Tuesday 5th December 2017

Kafkaesque:  \ käf-kə-ˈesk \ Marked by a senseless, disorienting, menacing, nightmarishly complexity. One morning when I woke from troubled dreams, I decided to blog about something potentially Kafkaesque: Which Instaclustr managed open-source-as-a-service(s) can be used together (current and future)? Which combinations are actually possible? Which ones are realistically sensible? And which are nightmarishly Kafkaesque!? In previous blogs, […]

Read more

Spark Structured Streaming with DataFrames

Tuesday 28th November 2017

This blog provides an exploration of Spark Structured Streaming with DataFrames The blog extends the previous Spark MLLib Instametrics data prediction blog example to make predictions from streaming data.  We demonstrate a two-phase approach to debugging, starting with static DataFrames first, and then turning on streaming. Finally we explain Spark structured streaming in more detail […]

Read more

A Luxury Voyage of (Data) Exploration by Apache Zeppelin

Thursday 9th November 2017

Data Exploration into the cutting-edge technology of Apache Zeppelin The catastrophic crash of the Hindenburg in 1937 ended the era of luxury travel in the colossal fast ships of the air that were pushing the boundaries of air travel technology.  Zeppelins had many experimental innovations like an auto-pilot, were made from Duralumin girders (a new […]

Read more

Behind the Scenes

Wednesday 25th October 2017

Spoiler alert! Kubrick’s scientific consultant Frederick Ordway once revealed that Kubrick had the props for the film destroyed because he didn’t want to ruin the illusion of 2001 for people.  If you prefer to believe that 2001 was real, stop reading now, as behind-the-scenes photos did survive. 2001 pioneered lots of special effects! It was […]

Read more

Fourth Contact with a Monolith

Friday 20th October 2017

“The thing’s hollow — it goes on forever — and — oh my God! — it’s full of stars!” It’s full of Spreadsheets! (DataFrames) Given that a dog, Laika, was the 1st astronaut to orbit the earth, it’s appropriate for a dog to travel through the wormhole. After travelling through the wormhole, the 2001 story […]

Read more

Third Contact with a Monolith: Part C – In the Pod

Friday 29th September 2017

A simple classification problem: Will the Monolith react? Is it safe?! Maybe a cautious approach to a bigger version of the Monolith (2km long) in a POD that is only 2m in diameter is advisable.   What do we know about how Monoliths react to stimuli? A simple classification problem consists of the category (label) “no […]

Read more

Third Contact With a Monolith – Beam Me Down Scotty

Wednesday 20th September 2017

Regression Analysis is (relatively) easy Hypothesis: Using only a subset of GC metrics we can compute linear regression functions using only heap space used to predict when the next GC occurs. To do this we don’t need access to all the metrics per host, just a subset. And we can extend it in the future to […]

Read more

Third contact with a Monolith – Long Range Sensor Scan

Thursday 14th September 2017

The Odyssey Continues – A Long Trip to Jupiter Earth to Mars distance = 0.52 AU (1.52-1AU, 78M km) Earth to Jupiter distance = 4.2 AU (5.2-1AU, 628M km) It’s a long way to Jupiter, would you like to: (a) sleep the whole way in suspended animation?  (bad choice, you don’t wake up) (b) be embodied […]

Read more

Hello Cassandra! A Java Client Example

Thursday 7th September 2017

This is the third (and final) part of my blog-series on creating a demonstration Cassandra cluster, connecting, and communicating. We landed on the moon and made Second Contact with the Monolith (CQL shell) in our last blog, but what can we do to understand the Monolith better? Let’s explore Cassandra Java client program. Java Client […]

Read more

Consulting Cassandra: Second Contact with the Monolith

Wednesday 6th September 2017

In the first part of this blog (Cluster Creation in Under Ten Minutes), I created a Cassandra cluster. In this part, we blast off to the Moon for 2nd contact. Consulting the Oracles Croesus: Hi Oracle.  How will my war with Cyrus the Persian go? Oracle: If you proceed, a great empire will be destroyed. Croesus: […]

Read more

Cassandra Cluster Creation in Under 10 Minutes

Tuesday 29th August 2017

Enough Information I watched the classic movie “2001: a Space Odyssey” for the nth time on the weekend.  My previous favourite quote from HAL (the eventually paranoid and murderous ship AI) was:         Dave: Open the pod bay doors, HAL.         HAL:  I’m sorry, Dave. I’m afraid I can’t […]

Read more

Paul Brebner (the Petabyte Person) joins Instaclustr (the Petabyte Company)

Thursday 17th August 2017

1. Hello, World! Hi, I’m Paul Brebner and this is a “Hello, World!” blog to introduce myself and test everything out. I’m very excited to have started at Instaclustr last week as a Technology Evangelist.    One of the cool things to happen in my first week was that Instaclustr celebrated a significant milestone when […]

Read more