By Paul Brebner Tuesday 19th December 2017

Apache Kafka Christmas Tree Light Simulation – Seasons Greetings From Instaclustr

Technical

Apache Kafka Christmas tree light simulation

What would you do if you received a request like this?

Apache Kafka Christmas Tree

Looks like poor old Santa confused us with a company that produces Instant Forests.  But maybe Santa knew I’d been trying out Decision Tree Machine Learning (including Random Forests) on our performance data (see previous blogs on Machine Learning with Spark)?

But then I started thinking. Obviously, Santa is “magic” (what online shop can deliver presents to millions of people on Christmas Eve!), so maybe all I had to do was come up with was a simulated virtual tree and Santa could handle the rest?

Where to start? What (in computer science or mathematics) is tree-shaped?

On the first day of Christmas
my true love sent to me:
A Quincunx machine in a Pear Tree

On the second day of Christmas
my true love sent to me:
2 Bean Machines
And a Quincunx machine in a Pear Tree

On the third day of Christmas
my true love sent to me:
3 Galton Boards
2 Bean Machines
and a Quincunx machine in a Pear Tree

On the fourth day of Christmas
my true love sent to me:
4 Pascal’s Triangles
3 Galton Boards
2 Bean Machines
and a Quincunx machine in a Pear Tree

Ok, that’s enough! Ten identical presents are rather boring.  A Quincunx machine (also called Bean machine or Galton Board) is a vertical board with nails banged into it in the shape of a Christmas tree.

Apache Kafka Galton Board

Here is a clip of one in action.

Balls are dropped from the top (where the “Star” would be on a tree), and bounce either left or right as they hit the nails and drop down to the next level, and so on, until they drop out the bottom. They are collected into bins at the bottom and the height of the bins will eventually approximate a bell curve. Overlaying Pascal’s triangle onto the nails shows the number of different paths that can be taken to get to each bin (as each value is the sum of the two values above it to the left and right, starting with a 1 at the top Star position). The bins towards the middle are reached by more paths and will collect more balls:

 Apache Kafka Pascal triangle

Obviously, I’m not the first person to connect these ideas.  Here’s a Pascal’s Christmas Tree simulator, a Quincunx simulation, and this blog does a full The 12 days of Pascal’s triangular Christmas!

Pascal Christmas Tree simulator

The Pascal’s Christmas tree simulator (image above) inspired me to use a similar approach for a Christmas tree lights simulator. The tree will be a Galton board, with lights at each nail position (baubles in the above picture). The positions are traditionally rows and columns starting from row 0 at the top, and columns starting from 0 on the left. The star is therefore (0,0).  A light will be ON if a ball hits the nail, and will stay on for a second, and then turn OFF.  Lights will turn on/off (twinkle!) as balls enter from the top, and drop down to the bottom. If balls drop in fast enough then multiple lights will be on at once.

Time to open up an early Christmas present (Kafka, which is on the Instaclustr roadmap for 2018) and use it to write a scalable Christmas tree lights simulation. Some imagination may be required (perhaps enhanced with a few glasses of Christmas port).

The design choices I had to make were around the number of producers, consumers and topics, what roles they would have, what the message structure would be, and how to handle time.

We’ll start with a single Kafka Producer to drop balls onto the top of the tree (at the Star), with a new ball arriving every second. The “star” is just a simple java producer running in a thread which sends a (0,0) message to the “tree” topic, and sleeps for a second before repeating until the desired number of balls have been released.

Kafka Christmas Tree

Initially, I had thought of using multiple topics to capture the topology of the relationship between the lights, but his would have been too complicated for larger trees. A simpler solution using a single topic (called “tree”) is more obvious.  I also decided on a very simple message structure using the Kafka message key as the row value, and the message value as the column. Note that for multiple partition topics this may not be ideal as the key is used for distributing messages across partitions and the small number of row values may not result in a good hash function.

Currently the tree only has the top starlight illuminated all the time.  What’s missing? Gravity!  To simulate gravity we need a consumer/producer pair, I called this “twinkle”. To twinkle, we receive a message (ball location) from the tree topic, randomly decide if the ball will go left or right, and then publish a message consisting of the new location back to the “tree” topic.   Will this work? Not really. This is just an infinite loop, what we really need is a delay queue or explicit timestamp handling, so we can delay the processing of the new ball location until the 1 second ON time has elapsed. A hack for this is to sleep the Twinkle application consumer thread so it only checks for new messages periodically. This is what we have so far.

Apache Kafka Christmas Tree

What’s missing? What should we do with balls that reach the bottom of the tree (the last row)? We could just make them vanish (by not sending a message back to the tree topic). Or better (in the spirit of the Galton Board) let’s add a “count” topic and a Count Consumer to add up the number of balls that fall into each bin under the tree:

Apache Kafka Galton Board Christmas Tree

The final problem is how to display the tree and lights state? The simplest solution is to have another consumer which subscribes to the “tree” topic and changes the state of the correct lights for each new message in the topic (change of ball location). This is possible and was the approach I tried first. However, the Twinkle and Display consumers have to be in different consumer groups (because of the way the Kafka protocol works, to ensure that they both get every message published to the tree topic), and computing the state and handling timing was tricky:

Apache Kafka Christmas Board Diagram

An improved version computes the state change of the lights in the Tinkle application (step C), and sends a state change message (light location OFF or ON) to the corresponding dedicated topic (Light OFF, Light ON). Every second, the State Display consumer applies all the OFF messages first, and then the ON messages, and then prints the tree lights state (in ASCII). Each “*” is an ON light ON, each “.” is an OFF light.

Apache Kafka Christmas Tree Diagram

Here’s a sequence that could be printed for a simple tree with 3 rows, and a single ball dropped in at the top:

Time 1: star lights up (0,0)

*
..

Time 2:  light at (1,1) ON

.
.*

Time 3: light at (2,1) ON

.
..
.*.

Time 4: All lights OFF

.
..

Here’s an (edited) run with 10 rows and 100 balls:

Welcome to the Instaclustr XMAS Tree Simulator!!!!!

*
..

….
…..
……
…….
……..
………
……….

Etc (lots of balls)

*
**
**.
.**.
..**.
..***.
…**..
*..**…
*..*…..
..*.***…

Etc (no more balls arriving, eventually end up with all lights OFF)

.
..

….
..*..
..**..
..**…
..**….
…**….
…..**…

A Happy Normally Distributed Xmas! Counts:

col 0 = 1
col 1 = 2
col 2 = 7
col 3 = 19
col 4 = 28
col 5 = 25
col 6 = 14
col 7 = 4
col 8 = 0
col 9 = 0

Total events counted = 100

More interesting christmas lights with colour could have been simulated by using a richer message structure, e.g. a message number as the key, and a compound value type consisting of (row, col, colour).

Here’s a simplified topology diagram showing just the relationship between producers, topics and consumers.

Apache Kafka Simplified topology diagram

Is this the best/correct approach for:

  • Computing and keeping track of the state of the lights?
    • Probably not. Kafka streams and a KTable (which maintains state) may be better. Here’s a blog.
  • Handling time?
    • Probably not. As I mentioned, using a delay queue or explicit timestamp handling would be better. Essentially I’m using Kafka as a discrete event simulation (DES) which it isn’t really designed for, but in theory, it should work as all you need is events and timestamps. I (briefly) tried using the different timestamp extractors but had a few issues, I suspect that they are designed to work (best) with Kafka streams. So maybe Santa could get a few elves to add some code for this, perhaps using windows.

Will Santa be happy with this? He should be! Given that it’s written in Kafka it will scale massively. It should be easy to increase the speed of the simulation (i.e. run it flat out), increase the size of the tree, and even simulate a forest of trees, each with a slightly different algorithm and/or starting condition, and multiple different ways of displaying the tree/lights. There’s also reprocessing (that I mentioned in this blog), where Kafka persists all the messages, so consumers can choose which messages to process. Consumers could display any historic state of the tree lights.

Canberra was in the news a few years ago with a charity fundraising world record for the number of lights on an artificial Christmas tree.  What does ½ a million lights look like?

Christmas Tree Canberra

I just had to see if the Kafka Christmas lights simulator was up to the challenge. It was.

A simulated tree with 500,000 lights and 100,000 balls dropped in ran in 555s, processed over 400 million producer + consumer events, and ran 180 times faster than real-time (running flat out), achieving an average throughput of 43 million events per minute, not bad with everything (Kafka broker and java code) running on my laptop. Why stop there? 1 million and 2 million worked fine, can I claim the world record for the most lights on a simulated Christmas tree?

Here’s the code (Java). Note that I made a minor change to the twinkle application to correctly process the Star light.  The Star producer now sends a special message (-1, -1) to Twinkle which interprets this as a special case, i.e. an arriving ball with no location yet, and sends a message back to the tree topic with the star location (0,0) and a (0,0) message to the Light ON topic.

Please check out our Spark Streaming, Kafka and Cassandra Tutorial for details on installing and running Kafka and Zookeeper and visit our GitHub to access all Java Code.

Java Kafka Code

Site by Swell Design Group