• Apache Kafka
  • Technical
“Kongo” Part 1: Apache Kafka®—IoT Logistics Streaming Demo Application

What’s a good name to give a demo IoT streaming application dealing with large scale logistics? How about a river… Maybe The “Amazon” application? That’s sort of taken. The Amazon is the longest river and has the most water flow, but what’s the 2nd ranking river? The Congo! The Congo is the 2nd biggest river (in terms of water flow), and it is the deepest river. The Congo has 15,000km of navigable waterways and was very important for trade.

                                                     (Source: Shutterstock)

“Going up that river was like travelling back to the earliest beginnings of the world, when vegetation rioted on the earth and the big trees were kings. An empty stream, a great silence, an impenetrable forest. The air was warm, thick, heavy, sluggish. There was no joy in the brilliance of sunshine. The long stretches of the waterway ran on, deserted, into the gloom of overshadowed distances.”

Joseph Conrad, Heart of Darkness

If you travel up the Congo these days you won’t see any of the original river streamers but boats of rather different sorts transporting people and goods:

(Source: Wikimedia)

The Congo river was also called the Kongo, after the Kingdom of Kongo which was at the mouth of the river and also the origin of King Kong, so that’s what we’ll call our IoT application.

Kongo Application: Real World Trade


Logistics = The acquisition, storage, transportation, and delivery of goods.

Things in the Real World

“Welcome to the real world.”    Morpheus

We picked logistics as the problem domain for the Instaclustr Kongo IoT application. Logistics is focused on storing physical Goods in Warehouses and moving Goods between Warehouses using Trucks. So the “Things” of interest are just Goods, Warehouses, and Trucks. Goods are of different types, have a location, can be stored in warehouses, and are moved from one warehouse to another by trucks that are loaded and unloaded at warehouses. Sounds simple enough in theory. To make it a bit more concrete, here are some example—Goods: live chickens, radioactive waste, fresh fruit and vegetables, and priceless artworks.

Warehouses are buildings at specific locations that can store goods. Trucks arrive at warehouses, any goods in the trucks are unloaded into the warehouse, goods are loaded onto the empty trucks, and the trucks depart for other warehouses.

Trucks (and in theory trains, planes, ships, bicycles, drones, etc.) can transport goods. Some examples of Trucks are steam delivery truck, and Australian Outback road train.

Interface Between the Physical and the Virtual: Monitoring

“You take the blue pill – the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill – you stay in Wonderland and I show you how deep the rabbit-hole goes.”   Morpheus


Given the real-world complexity of logistics, it’s a good idea to try to keep track of the location of goods (which can be stored in warehouses or being transported in trucks) and monitor if goods are stored and transported correctly. This is where monitoring data and software to collect and process the data is vital.  

How can we track the location of goods and trucks? There are two things we need. The first is the location of trucks. This is easy enough with GPS fitted to trucks.

Tracking the location of goods is a bit trickier as there are more of them, they don’t have power, and the cost is an important factor. Tracking goods is increasingly solved with devices called RFID tags. These are small and cheap and can be directly attached to the goods. RFID readers can detect the tags when close enough, and produce a message containing the read ID and/or location of the reader, and the unique ID of the detected RFID tag. All we then need is to be able to map the IDs to actual locations and Goods and we can keep track of the movement of goods. The simplest approach is to scan RFID tags when trucks are loaded and unloaded. After loading, the location of the goods is assumed to be in the truck until the truck is unloaded. Once unloaded, the location of the goods will be in the warehouse associated with the RFID reader which detected the goods being unloaded.

RFID tags are attached to Goods, and the tag ID also needs to be associated with the goods ID in order to keep track of the actual goods when RFID reader events are produced (i.e. readers only detect tags, not the actual goods).

RFID tag attached to banana

RFID Readers

RFID readers are used to detecting RFID tags within the range of the reader. Early readers had very limited ranges, requiring goods to be scanned by a hand-held reader, or moved past or through a fixed reader, e.g. on a conveyor, or warehouse gateway readers. More recent reader technologies have larger ranges and can detect multiple RFID tags in a warehouse or a truck, allowing for continuous monitoring of all goods at a location.   We initially assume a gateway RFID reader at each warehouse truck dock (picture on the right):

RFID reader examples


“What is real? How do you define ‘real’? If you’re talking about what you can feel, what you can smell, what you can taste and see, then ‘real’ is simply electrical signals interpreted by your brain.” – Morpheus

Apart from RFID reader events which help keep track of the location of Goods, we assume that there is an arbitrary number of different types of sensors which are monitoring conditions in warehouses and trucks. For example, acceleration and vibration (in trucks), and light level and environmental monitoring (in warehouses). Sensor values will be used to check if goods have been safely stored and transported.  We assume that a continuous stream of values will be produced from warehouse sensors (e.g. every second), but for truck sensors, the data may be recorded onboard and only made available in a batch when the truck docks with a warehouse. For example, a shock and vibration data logger (for use in a truck) and an environmental sensor (for monitoring warehouse gases):

Sensor examples

The Virtual World: Software

“You know, I know this steak doesn’t exist. I know that when I put it in my mouth, the Matrix is telling my brain that it is juicy and delicious. After nine years, you know what I realize? Ignorance is bliss.” – Cypher

In a real production RFID logistics system, the RFID reader data and sensor data are captured, transferred, persisted, and processed by a combination of networks, middleware, databases, and application software. The software can be used to perform a variety of tasks including tracking the location of goods and trucks in real time; checking and enforcing transportation and goods rules in real time, and for auditing after delivery; checking and enforcing business rules; financial control, and billing; planning, approving and optimizing the storage and movement of goods, etc.

The IoT Application: Kongo

“There is no spoon” – Spoon boy, Neo

The Kongo Application simulates the entire physical, interface, and virtual parts of the system in software. Instead of using software to just track and check goods we also run it “in reverse” to create things (Goods, Warehouses, and Trucks), simulate the movement of goods and trucks, and generate RFID and sensor data.

kongo demo application simulation Instaclustr

At a high-level Kongo models Goods, Warehouses and Trucks. Goods have attributes including quantity, size, and weight, location, and categories. Goods have 0 or more categories including Hazardous, Perishable, Fragile, etc.  Goods have a unique RFID tag assigned to them and which is detected by RFID readers. Goods are either located in a warehouse or are being transported in a truck.

Warehouses have a fixed location and can store 0 or more goods. Warehouses have sensors that produce continuous values for metrics including temperature, humidity, light level, gas levels, etc. Warehouses have RFID readers at each truck dock entrance and produce truck load and unload events (E.g. “Loaded RFID Tag tagId onto Truck truckId at Reader readerId”).  Some warehouses have climate control for different temperature ranges. Goods are moved between warehouses on trucks.  There may be 0 or more trucks at each warehouse being loaded and unloaded.

Trucks are used to transport 0 or more goods between warehouses.  Some trucks are climate controlled for different temperature ranges. Trucks have sensors for metrics such as temperature, humidity, acceleration, and vibration. Trucks are loaded at the start of each “turn” with random goods from the warehouse they are docked at and instantly move to another warehouse for the start of the next turn, where they are unloaded. Trucks are moved even if empty (as a simulation with only 1 truck and goods initially at one warehouse has to move the truck around to find goods to start things moving).

For simplicity, only warehouses and trucks have actual locations (which could be lat/long but are in fact just grid coordinates), but currently, only the unique warehouse and truck UUIDs are used to determine location. Goods are therefore assumed to be at the same location of the warehouse or truck that they are “in”.

Simulation Steps

The simulation works like this.

1. Creation

In the creation phases, the simulation creates the desired number of goods, warehouses, and trucks (e.g. 1,000,000 different goods, 100 warehouses, 200 trucks).   Set time to zero. The rest of the simulation is turn-based and does a number of different things in a loop each logical turn.

2. Loop for target number of turns

3. Unload trucks

At the start of each turn, trucks arrive at warehouses and are unloaded. This produces RFID “unload” reader events for the goods that are unloaded and the goods are now located in the warehouse.

4. Truck sensors

Truck sensor values are produced, as we assume that these may not be available until trucks dock at a warehouse,  even though the readings are taken during the trip.

5. Warehouse sensors

Warehouse sensor values are produced for warehouse environmental monitoring.

6. Check rules

If optional rule checking is turned on, then for each truck and warehouse sensor event the environmental rules are checked for violation for each good that is in the same location (warehouse or truck). Goods co-location rules are also checked for each good that was on each truck.

7. Load trucks

A certain percentage of Goods in the warehouse are then selected to be loaded onto the available trucks at the warehouse and are loaded onto trucks randomly. If loading rules are being enforced (see below) then Goods are only allowed onto trucks based on valid combinations of categories of goods on each truck, and climate control rules. This can result in only a single good being on a truck each turn.  If no suitable truck is available then goods remain in the warehouse for the rest of the round. Trucks are then assigned a random destination warehouse location to drive to for the start of the next turn.

8. Increment time


The application can be configured to optionally enforce and/or check two types of rules as follows.


Each Good has 0 or more special categories. Goods with no categories are considered harmless and indestructible. Some goods categories are not allowed to be transported together in the same truck. We assume that warehouses are big enough to cope with all goods categories in one location.   Hazardous goods are not allowed to be transported with non-hazardous goods or other hazardous goods (the reality is more complex as there is a matrix to determine compatible goods types including maximum weights). For example, it’s not a good idea to transport explosives with spontaneously combustible solids (i.e. keep matches away from fireworks).

Goods may have the following categories (0 or more, for example, it’s possible to have hazardous and fragile and perishable goods):

  • Perishable
  • Hazardous
  • Fragile
  • Edible
  • Medicinal
  • Bulky
  • Dry (must be kept away from moisture)

And 0 or 1 temperature category from:

  • Frozen Temp
  • Heat Sensitive Temp
  • Cool Temp
  • Room Temp
  • Ambient Temp

These categories are obviously not exhaustive, and there are lots of weird goods categories I hadn’t thought of. The Australian government lists 97 categories of goods including live animals, chemical waste, articles of animal gut, human hair, metals clad with precious metals, aircraft, ammunition etc.!

Each category has rules, some of which check co-location and others check environmental things.  e.g. Fragile goods cannot be transported with Bulky goods (unless they are also Fragile and the same type of goods). Perishable goods must be kept dry, in the dark, and be delivered before their expiry date is reached.  The co-location rules are used to optionally enforce that only permitted goods combinations are loaded onto the same truck.


Warehouses have sensors producing continuous metrics for environmental things such as temperature, humidity, and a range of gas and smoke sensors. Trucks have sensors for temperature, humidity, vibration, and acceleration.  Each Goods category has sensor rules which determine if the rules have been violated. The rules may need to apply over multiple metrics, different time windows, and statistics (e.g. average, max, sum).

Warehouses and Trucks are either climate-controlled or not climate controlled. If climate-controlled they are suitable for Goods in 1 or more categories for temperature ranges, or Goods with no temperature rules (which is probably odd, as you wouldn’t really want to store rocks in a freezer!)

If rules are being enforced by the simulation then Goods are only allowed to load onto Trucks with compatible climate control for their Categories, and Trucks are directed to warehouses with compatible climate control to ensure that the goods are unloaded into a compatible climate-controlled warehouse.

Sensor values for climate-controlled warehouses and trucks can be forced to be in the range all the time or in the range most of the time with a probability of being out of range (e.g. for over or under temperature events). Other sensor types have a probability and distribution function for producing lower or higher values that can be configured (e.g. vibration and acceleration).

Steaming Upstream With Kongo

The Kongo application started out as a monolithic Java application, but the intention is to use it as a starting point to explore distribution, scalability, reliability, functionality etc as we re-engineered on the Instaclustr Open Source technologies. Parts of Kongo may be used as is initially (e.g. the simulation producing an event stream), while other parts are made more scalable (e.g. the rules checking code could be implemented in Kafka streams).

Some of the challenges around the future functionality, design, and distribution may be:

  • State management in the rules checking, as event processing is location dependent. We need to efficiently and reliably track the location of goods and trucks.
  • Are there one or many applications? In theory, there could be one application per warehouse and truck if they are owned by different companies, or have to be distributed due to technical or geopolitical constraints.
  • To explore Kafka’s ability to handle event time semantics, windows, etc, the simulation could be configured to generate a proportion of duplicate, missing, delayed, fake, stuck, and out of order, etc events.
  • The location could be more complicated and require geohashing or similar to determine which things are co-located based on location and distance.
  • The rules could be used for planning and optimizing goods movements (before movement), checking each movement immediately, checking at the end of a delivery, and compliance/auditing sometime after deliveries. Historical goods, location, RFID, and sensor data need to be available long enough to do this.
  • Functional enhancements could include creating more goods during the simulation (e.g. some warehouses can be treated as factories that produce new goods), delivering goods (some goods may be delivered each turn, and the rules will need to be checked back to the start of the supply chain).
  • Could add more rules to check Business Processes and detect anomalous events. For example, no direct truck-to-truck loading is allowed, detect goods that go missing from a truck, truckload limits, add warehouse workers, and rules to check for health and safety over different time periods (hour, shift, monthly, yearly).

Next blog we’ll explore design choices for the Kongo application (e.g. using an internal event bus), and an initial Apache Kafka implementation (what are the tradeoffs/benefits in using 1 or many topics?)

Here’s the first version of the code: https://github.com/instaclustr/kongo