• Apache Kafka
  • Dev Rel
  • Redis
Redis™ Streams vs Apache Kafka®

1. What are Redis Streams?

fish specimen preserved in jars on a black background

Redis Streams preserves messages (but not fish specimens) (Source: Shutterstock)

The Redis Streams data type is newer than the Redis Pub/Sub data type, and is designed to support “disconnected” distributed streaming applications. The data type itself is essentially an append-only data structure, stored in memory—basically preserved messages!

This differs from Redis Pub/Sub channels, which are focused only on the delivery of messages to currently connected subscribers only—Pub/Sub uses a push-based delivery mechanism, and if there are no current connected subscribers, then messages are simply discarded. And channels don’t remember messages to enable disconnected subscribers to catch up with missed messages, replay messages, or read different ranges of messages, etc. But channels are fast!

Redis Streams store or “preserve” messages, even if there are no currently connected consumers, and even after successful delivery, and support blocking/non-blocking consumers, consumer groups, and come with a new operation to write data to Streams, XADD.

The Streams data type is also more complex than Pub/Sub channel values (which are just Strings), as stream entries consist of a set of 1 or more field-value pairs. Stream entries are also strictly ordered over an “ID”, which must be unique, and which is made up of 2 numbers separated by a ‘-’ character, consisting of a timestamp and a sequence number, e.g. 784738748-1, 784738748-2, 784738754-1, etc.

2. Reading Redis Streams with XREAD Single Consumers

A young boy dressed for cold weather sits in a red toy car stuck in the snow during the winter season. His older brother helps by giving the car a push from behind.

It’s hard work if you are the only one pushing!
(Source: Shutterstock)

Redis Streams support two styles of consumers: individual consumers and consumer groups, and both styles can be used simultaneously on the same streams. Let’s look at individual consumers first, using the commands XREAD and XADD

An XREAD client reads data from 1 (or more) streams, only returning entries with IDs greater than the specified ID. It can also optionally block for a specified maximum time in milliseconds (BLOCK timeout), and return a maximum number of entries (COUNT count), for example:

BLOCK 0 tells XREAD to BLOCK forever (never timeout, timeouts are in ms), STREAMS is a required keyword followed by 1 or more “keys” (or stream names), and matching IDs to start reading after.

In this example, we are reading from a stream called “mystream” starting after the special ID character “$”. What does that mean? This tells Redis Streams that we are only interested in “new things”, and it should only be used as the first call to XREAD by a consumer on a given stream. Otherwise, if you use it again you could potentially miss entries that were added between polls (which is comparable to Redis Pub/Sub disconnected delivery).

Note that if BLOCK isn’t specified, then XREAD will return with any entries currently available (up to the COUNT), or potentially none at all.

Now let’s add an entry to mystream with XADD:

This operation adds an entry consisting of two field-value pairs to “mystream”:

But what does the “*” character do? This is another special ID character that will auto-generate a unique ID, using the timestamp of the Redis node it is run on as the 1st number, and an increasing sequence number as the 2nd. Note that you can use an explicit ID if you don’t want an auto-generated one, but this can result in an exception if it’s not strictly greater than existing IDs, and is only really useful if you can rely on another definitive source of strictly increasing IDs.

The XADD operation returns the ID generated, for example:

At this point the XREAD operation, that has been patiently waiting for an entry to arrive in mystream, returns the following result

which is the name of the stream (remember you can be polling multiple streams at once), the ID, and the entry field/value pairs structure (The result is an array of arrays).

So it seems to be working so far.  Let’s add another entry with a higher temperature:

To get further entries, we have to call XREAD again, being careful to specify the last ID we’ve seen as follows:

Note that if we forgot to do this and tried the original operation with “$” instead

it just blocks, waiting for new entries. For example if we now add an entry: 

XREAD returns:

Thereby missing out on the temperature 101 entry.

So, in summary, Redis Streams XREAD relies on consumers remembering the ID of the latest entry they received. If something goes wrong, the best they can do is start reading from the newest entries again, or if they support idempotent operations, reading from further back (e.g. some time back if they can estimate their recovery time), or even from the beginning all over again (ID 0-0 or 0)!

Note that there are a couple of other special characters for IDs: “-” means minimum possible ID, “+” means maximum possible value.

There’s also a related command called XRANGE. This is used for reading a range of IDs (remember IDs are just timestamps, so you can easily request all IDs in a specific time range), or even just fetch a single entry. And XREVRANGE returns entries in reverse order.

So, Redis Streams with XREAD puts most of the effort of remembering IDs onto the consumer. But it does support disconnected delivery with at-least-once delivery semantics (per consumer), 1-1 (point-to-point), many-to-1 (fan-in), and 1-to-many and many-to-many (fan-out) delivery topologies (multiple consumers each receiving a copy of each message sent to a stream), but not high concurrency (multiple consumers) for a single stream. 

The topologies are similar to Redis Pub/Sub, but the main enhancements for XREAD/XRANGE are at-least-once delivery semantics (c.f. at-most-once for Pub/Sub), and faster performance independent of the number of consumers: XREAD is O(1) for a fixed COUNT, c.f. the slower O(consumers + patterns) for Pub/Sub.

But Redis Streams supports an even more sophisticated type of consumer which we’ll look at next.

3. Reading Redis Streams With XREADGROUP Consumer Groups

Two young children ride in the third race car from wooden barrels on a rural road autumn evening

Redis Consumer Groups enable sharing of work
(Source: Shutterstock)

Redis Streams have support for consumer groups. That is, the ability to share messages among more than one consumer subscribed to the same stream. This helps with latency (e.g. due to slow consumers), and throughput, by enabling higher concurrency (more consumers available to process the same amount of data), and a mechanism to share the messages among the available consumers. Using XREAD, the only way of doing this is to have multiple streams to increase the consumer concurrency. 

The command XREADGROUP provides the consumer group functionality.

But first, you have to create a consumer group using the XGROUP command, for example:

This command creates a new consumer group associated with the specified stream, which will start receiving the latest message ($ = latest, 0 = from start).

XGROUP also has other options for deleting a group, and creating/deleting streams and consumers, and forcing consumers to reprocess from a specified ID etc.

Let’s run 2 consumers in this new group as follows:

Consumer 1: 

Consumer 2: 

The XREADGROUP operation requires the GROUP keyword followed by the group name (created with the XGROUP command), and the consumer name (every consumer in the group must have a unique string name). COUNT is optional (and returns COUNT messages),  as is BLOCK with a timeout. The STREAMS keyword is required and is followed by 1 or more streams to subscribe to, and the starting ID to read from for each stream. In this above example there’s another special ID character, this time “>” which means “messages never delivered to any other consumer in the group”, i.e. new messages only.

Now we add four new messages to mystream:

Consumer worker-1 returns the 1st entry:

And consumer worker-2 returns the 2nd entry:

The XREADGROUP command can also take a specific ID, and will then return any entries in the “pending list” for this consumer (i.e. entries that have been delivered but not acknowledged). For example

just returns the entry that worker-1 has already seen (but not yet acknowledged).

If we now explicitly acknowledge this message ID with XACK

and call the same XREADGROUP command again

there are no longer any entries in the pending list so an empty result is returned.

Now let’s run the original XREADGROUP commands with the special ID character “>” again:

Consumer worker-1 returns the 3rd entry:

And consumer worker-2 returns the 4th entry:

This correctly gives the consumers the next available unseen entries. Note that worker-2 hasn’t acknowledged any of their messages yet however, so worker-2 calling XREADGROUP with ID=0 would return the 2 entries that it has already seen.

This demonstrates that:

  1. each message is delivered to some consumer (every message is delivered) 
  2. the messages are being shared among the available consumers in the group 
  3. each message is only delivered to a single consumer 
  4. each consumer only gets messages that it has seen before or new messages (at-least-once delivery per consumer)
  5. each consumer gets the next message available every time it polls again with the special “>” ID character, otherwise 
  6. each consumer only gets unacknowledged entries if it requests a specific ID (or none).

How does this all work? Fundamentally, the XREADGROUP command is a read and write operation (so it only works on Redis Master nodes, unlike XREAD which works on all Redis nodes)—the Redis server keeps track (per consumer group) of both the last message ID delivered, and the entries that have been delivered to each consumer (the pending list), and only removes them from the pending list when the consumer acknowledges that they are successfully processed (using XACK). If for some reason the consumer fails between getting the entry and acknowledging it, then the next time it polls it can explicitly request entries from the start of the pending list again, or just continue with the latest unseen messages if that’s more useful.

The performance of XREADGROUP is identical to XREAD, 0(1). The main improvements that it offers over XREAD are consumer groups which automatically load balance the messages across multiple consumers to increase concurrency and throughput, and automatic tracking of delivered messages to ensure that disconnected delivery works without the consumers having to remember which ID they are up to.

The downside of this, is that to correctly get messages that were delivered but not actually processed (the pending list), the consumers must explicitly acknowledge each ID (making the consumers read/write, and they can only use the Redis master nodes). However, this means that XREADGROUP supports exactly-once processing semantics so may be a good fit for more demanding use cases. 

Finally, Redis Streams have support for concurrency, replication, and durability as the Streams data type works with Redis clusters (for concurrency and replication), and can optionally be persisted to disk

4. Fixing XREADGROUP Failures with XPENDING, XCLAIM, and XAUTOCLAIM

Whoops! Can we XAUTOCLAIM the automobile after it’s gone off the cliff? (Source: Shutterstock)

Now let’s look at some XREADGROUP failure modes.

We’ve already seen what happens if a consumer fails before acknowledging entries—once restarted it can get any unacknowledged entries before new ones (although it must be able to restart with the same consumer name).

But what happens if the failed consumer never restarts? Because the events are just load-balanced among current consumers, all new events will still be processed correctly by the remaining consumers, but there may be some entries still in the pending list for the failed consumer, and under normal circumstances, no other consumer can access them.

There are a couple of commands to manually fix this problem, XPENDING, which allows you to discover which consumers in a group have pending entries, including a more detailed form that provides information per entry. Used in conjunction with the XCLAIM command, this allows you to reassign messages from one consumer to another. There’s also the XAUTOCLAIM command which combines both operations.

In practice, to ensure adequate concurrency and throughput, you would need to create new consumers to take over the load of failed consumers, rather than just allocating their entries to already running consumers.

5. Other Useful Streams Commands: XINFO, XTRIM

Snail shaped Boxwood Bush grows in garden. Small Bush trimmed in snail shape. Pruning Boxwood Shrub. Topiary tree

The only sort of snail you want in your garden! Topiary needs regular trimming to maintain its shape. (Source: Shutterstock)

The XINFO command is also very useful and has options for getting information for streams, groups, and consumers.

And if you discover that your streams are getting too big (remember that they do have to fit in memory!), or you want to trim them for some other reason, for example, to remove old expired entries, for specific entries for compliance reasons, to speed up XRANGE queries, or to persist some of the entries to other storage and then remove them, etc, then you can use XTRIM (in different ways). The XTRIM command also has a special character, “~”, which means approximate length (typically only slightly more, in the low 10s, than the target length).  

For example:

Note that there’s also an option on XADD to trim a stream at the same time as adding an entry.

6. Redis Streams vs. Kafka

Redis Streams are similar to Kafka in some respects. XREAD acts like single Kafka consumers, and XREADGROUP acts like Kafka consumer groups. One noticeable difference is that Kafka topics have partitions, which enable load balancing over the consumers in the group, but Redis Streams don’t have partitions. In this respect, Kafka topics without a key behave in the same way, with both using round-robin load-balancing.

Kafka uses topic partitions for high concurrency and throughput, supporting multiple consumers sharing the message processing load and multi-threading/multiple nodes on the server-side as well.

Partitions also give Kafka the ability to process events in-order, within each partition. Redis Streams can’t give this guarantee, and the closest solution would be to use multiple streams/keys, with a single consumer per stream.

Kafka has better automatic management of consumer groups, including automatic rebalancing when consumers come and go. Both Redis Streams XREADGROUPS and Kafka support at-least-once-delivery, and exactly-once processing semantics (with explicit Redis Streams XACKS and manual offset commits in Kafa, rather than the default auto-commit).

Apache Kafka vs. Redis Comparison Table

Here’s a table comparing some of the features across all 4 streaming technologies:

Feature Redis

 Pub/Sub

Redis Streams XREAD Redis Streams XREADGROUP Apache 

Kafka

Data type name channel stream stream topic
Value data type String Array Array Binary (anything supported by ser/des)
Explicit Key No No No Yes (optional)
Messages must fit in memory Yes Yes Yes No
Delivered messages preserved No Yes Yes Yes
Connected delivery only Yes No No No
Delivery time dependent on the number of consumers Yes No No No
Latency < 1ms < 10ms < 10ms < 100ms
Disconnected delivery No Yes Yes Yes
At-most-once delivery Yes No No No
At-least-once delivery No Yes Yes Yes
Exactly-once processing No No Yes (with XACK) Yes (with manual commit)
Consumers must keep track of last ID seen (used in next poll) No Yes No No
Consumer groups for load sharing No No Yes (round-robin only) Yes (round-robin or partitions)
Consumer group rebalancing N/A N/A Manual Automatic
Partitions for server-side concurrency No No No Yes
Message persistence No Optional Optional Yes
Replication and failover Yes Yes Yes Yes
Pattern subscriptions Yes No No Yes 

To summarize, use Redis Pub/Sub if you want fast (potentially sub ms) connected-only delivery, which is good for messages that need to be delivered quickly (e.g. real-time, time-bounded messages that are simply irrelevant after tens of ms), and if it doesn’t matter if messages are not delivered to disconnected subscribers (e.g. they have expired, the subscribers aren’t interested in missed messages). 

Use Redis Streams if you want messages to be sent instantly to consumers and if you need slightly slower (sub 10ms) but reliable disconnected delivery. Use Kafka if you are searching for reliability, fault tolerance, and high performance with great amounts of data.

Redis may also be a good choice if you want to use streaming data in conjunction with other data types supported by Redis. For example, the Anomalia Machina application I built using Kafka and Casandra could be easily re-implemented in Redis for both streaming and querying—it would be fast, but have limited scalability.

Finally, use Kafka if you need unbounded throughput and reliable horizontally scalable clusters, unbounded persisted message retention and replaying, reasonably fast delivery times (< 100ms, potentially in the low 10s of ms), and highly scalable processing of single topics (enabled by Kafka partitions and consumer groups).

Note that this analysis was based on published information including benchmarks, but I didn’t conduct any original comparative benchmarking. It’s therefore only a generalization about performance, so, as usual, you should always benchmark against your specific use case and technology, resources, and configuration choices. Apache Kafka in particular is highly tunable across the producer, broker, and consumer components, for different latency and throughput use cases.

Further Information

This Redis Streams introduction has some nice visualizations.

The above examples used the Redis CLI, but in practice, you will need to use a specific programming language Redis client. Here’s a simple example of using Redis streams with Java (using Redisson, which we previously looked at in this blog). We also previously examined Jedis, which also supports the Redis stream commands.

Some previous blogs that discuss Apache Kafka performance tuning include:

 

If you have questions about Instaclustr’s managed Redis clusters get in touch today to discuss options.

Get in touch