How to Use MongoDB Clients and FerretDB With Instaclustr PostgreSQL

1. What is FerretDB®?

When I first heard about FerretDB, my initial thought was what on earth is a ferret?!

From my childhood I vaguely recalled that ferrets, weasels, and stoats were the “baddies” from “The Wind in the Willows”, but that was about it (see endnote [1]). Departing the animal kingdom (for the time being), FerretDB is an open source (Apache 2.0 licensed) MongoDB® proxy built on PostgreSQL®.

Mole running from the “Terror of the Wild Wood” (Ferrets?)

(Source: https://commons.wikimedia.org/wiki/File:Wind_in_the_Willows_pg_65.png)

Now, PostgreSQL® I have heard of (and occasionally blogged about—PostgreSQL data types, PostgreSQL Boolean Three-Valued logic, PostgreSQL and JSON, and PostgreSQL and Debezium/CDC).

It’s a popular open source SQL database and is commonly described as an Object-Relational database. MongoDB is a JSON/Document oriented horizontally scalable database (which has a “copyleft” Server Side Public License). Basically, FerretDB translates the MongoDB JSON queries and protocol to and from SQL—that sounds clever, but maybe tricky.

Mind the Gap

(Source: Shutterstock)

Given the potential gap between JSON and SQL, I wondered what makes PostgreSQL a suitable backend for FerretDB? PostgreSQL has been around since the 1980s (and evolved from an even earlier 1970s variant, INGRES), starting as an object database. Here’s a motivation from the original paper:

“The first goal is to support complex objects. Engineering data, in contrast to business data, is more complex and dynamic. Although the required data types can be simulated on a relational system, the performance of the applications is unacceptable.”

The authors used geographic data as a complex object example, and I was interested to read this recent paper from Geoinformatica (2021) that reveals that PostgreSQL is still fast for GIS queries (faster than MongoDB in fact, except for joins).

But how does FerretDB actually store/retrieve JSON in PostgreSQL? Not surprisingly, FerretDB uses a recent feature of PostgreSQL, JSONB columns. This is the same trick that I used to store JSON data from Kafka Connect Sink Connectors in PostgreSQL for my pipeline blog series (using GIN indexes), and query them again for visualization in Apache Superset.

However, FerretDB uses a few more tricks as well. First, in common with MongoDB, data is stored in BSON—a binary representation of JSON for speed and type flexibility. Second, FerretDB uses a custom mapping called PJSON which translates from BSON to/from JSONB. Finally, FerretDB uses SQL query pushdowns to ensure compatibility with MongoDB and good performance. So, PostgreSQL was obviously a good choice for the FerretDB backend.

2. Down the Ferret Hole With Instaclustr for PostgreSQL

(Source: Shutterstock)

Next, let’s go “down the ferret hole” and “ferret out” how to use FerretDB with Instaclustr’s Managed PostgreSQL service.

To get up to speed, I downloaded and ran the “all in one” FerretDB docker installation on my Mac. This worked as advertised, and I was able to connect to it with a trial version of the “Studio 3T” MongoDB GUI (one of many options).

Next, I spun up a small developer PostgreSQL instance using the Instaclustr console.

To connect to an external PostgreSQL database, you must use a different version of FerretDB, the production image. I created a custom docker-compose.yml file to run this image and connect to the Instaclustr PostgreSQL server that I’d already created as follows:

services: 

  ferretdb: 

    image: ghcr.io/ferretdb/ferretdb 

    restart: on-failure 

    ports: 

      - 27017:27017 

    environment: 

      - FERRETDB_POSTGRESQL_URL=postgres://postgresqluser:password @IP:5432/ferretdb 

 

networks: 

  default: 

    name: ferretdb

services:

ferretdb:

image: ghcr.io/ferretdb/ferretdb

restart: on-failure

ports:

- 27017:27017

environment:

- FERRETDB_POSTGRESQL_URL=postgres://postgresqluser:password @IP:5432/ferretdb

networks:

default:

name: ferretdb

You need to fill in the specific Instaclustr managed PostreSQL details (available once the server is running in the console under connection information, including user, password, and IP address).

You run FerretDB with the command “docker compose up -d”.

This appeared to connect to the PostgreSQL server successfully, but then complained that the “ferretdb” database didn’t exist. So, I created one manually, but then it failed with an error about “lc _ctype” (something to do with character sets I believe):

pgdb.checkConnection: "lc_ctype" is "C.UTF-8"' on server 127.0.0.1:27017

1	pgdb.checkConnection: "lc_ctype" is "C.UTF-8"' on server 127.0.0.1:27017

After some googling and timely help from the FerretDB people, I created the ferretdb database using this PostgreSQL command (this worked, but may not be perfect):

create database ferretdb WITH ENCODING='UTF8' LC_CTYPE='C' TEMPLATE=template0;

1	create database ferretdb WITH ENCODING='UTF8' LC_CTYPE='C' TEMPLATE=template0;

3. Putting FerretDB to Work

(Source: Shutterstock)

But how can you put FerretDB to work? What can you do with it? Well, probably most things that you can do with MongoDB, as it is compatible with MongoDB clients. I tried a simple Java example as follows.

Here’s the pom.xml dependency you need for the Java MongoDB client:

<dependency> 

    <groupId>org.mongodb</groupId> 

    <artifactId>mongodb-driver-sync</artifactId> 

    <version>4.9.1</version> 

</dependency>

<groupId>org.mongodb</groupId>

<artifactId>mongodb-driver-sync</artifactId>

</dependency>

Here’s some simple code to insert a JSON document into PostgreSQL via FerretDB, and retrieve it by matching the value of one of the nested array elements.

package ferretdb; 

 

import java.util.Arrays; 

 

import org.bson.Document; 

 

import com.mongodb.client.FindIterable; 

import com.mongodb.client.MongoClient; 

import com.mongodb.client.MongoClients; 

import com.mongodb.client.MongoCollection; 

import com.mongodb.client.MongoCursor; 

import com.mongodb.client.MongoDatabase; 

 

public class TestFerretDB { 

 

public static void main(String[] args) { 

 

MongoClient mongoClient = MongoClients.create("mongodb://127.0.0.1:27017/"); 

 

 

MongoDatabase database = mongoClient.getDatabase("ferretdb"); 

mongoClient.listDatabaseNames().forEach(System.out::println); 

 

 

database.createCollection("test"); 

database.listCollectionNames().forEach(System.out::println); 

 

MongoCollection<Document> collection = database.getCollection("test"); 

 

int documents = 1; 

 

Document document = new Document(); 

document.put("name", "thing"+documents); 

document.put("length", documents); 

document.append("things", Arrays.asList(1, 2, 3)); 

 

collection.insertOne(document); 

 

Document searchQuery = new Document(); 

// match a value in the nested array “things” 

searchQuery.put("things", "1"); 

 

FindIterable<Document> cursor = collection.find(searchQuery); 

 

try (final MongoCursor<Document> cursorIterator = cursor.cursor()) { 

    while (cursorIterator.hasNext()) { 

        System.out.println(cursorIterator.next()); 

    } 

} 

} 

}

100

101

package ferretdb;

import java.util.Arrays;

import org.bson.Document;

import com.mongodb.client.FindIterable;

import com.mongodb.client.MongoClient;

import com.mongodb.client.MongoClients;

import com.mongodb.client.MongoCollection;

import com.mongodb.client.MongoCursor;

import com.mongodb.client.MongoDatabase;

public class TestFerretDB {

public static void main(String[] args) {

MongoClient mongoClient = MongoClients.create("mongodb://127.0.0.1:27017/");

MongoDatabase database = mongoClient.getDatabase("ferretdb");

mongoClient.listDatabaseNames().forEach(System.out::println);

database.createCollection("test");

database.listCollectionNames().forEach(System.out::println);

MongoCollection<Document> collection = database.getCollection("test");

int documents = 1;

Document document = new Document();

document.put("name", "thing"+documents);

document.put("length", documents);

document.append("things", Arrays.asList(1, 2, 3));

collection.insertOne(document);

Document searchQuery = new Document();

// match a value in the nested array “things”

searchQuery.put("things", "1");

FindIterable<Document> cursor = collection.find(searchQuery);

try (final MongoCursor<Document> cursorIterator = cursor.cursor()) {

while (cursorIterator.hasNext()) {

System.out.println(cursorIterator.next());

}

What’s interesting about this code? For a start, it’s just connecting to FerretDB running locally, so there’s no security required. FerretDB handles the security between itself and the PostgreSQL database. As I discovered, FerretDB requires a database to have been correctly created, and you also need a collection (similar to a table) before you can start inserting and querying JSON documents.

For a production deployment of FerretDB, you would probably need to deploy the FerretDB proxy on some cloud-based resources near the PostgreSQL database (e.g., an AWS EC2 instance) and configure security. Being a proxy, FerretDB will introduce some latency between the MongoDB client and PostgreSQL. On the other hand, PostgreSQL is fast so this may not matter, depending on the location of the clients (e.g., if there is latency due to the client location anyway), and your use case. Database proxies are common in practice (e.g., Java Database Connectivity, JDBC), and our own database proxy, Shotover, designed for controlling, managing, and modifying the flow of database requests in transit (currently supporting Cassandra and Redis sources and sinks).

PostgreSQL is vertically scalable (scale-up) (Source: Shutterstock)

Scalability is another factor that you will need to consider. MongoDB is known to be a horizontally scalable distributed database (scale-out), you just need to add nodes to increase write and read scalability. In contrast, PostgreSQL isn’t intrinsically distributed or horizontally scalable. But it can scale vertically (scale-up) for reads and writes (we support in-place resizing for PostgreSQL), and horizontally for high availability or reads. We have not yet investigated the scalability characteristics of FerretDB for production, so some investigation and benchmarking to find the optimal production deployment configuration may be required for your specific use case and throughput/latency/SLA requirements.

Some exciting recent news is that Instaclustr now offers PostgreSQL on ANF (Azure NetApp Files) storage (in public preview), which offers a significant boost in throughput, which would likely be automatically leveraged by FerretDB with MongoDB workloads.

4. FerretDB Compared With Apache Cassandra®, OpenSearch®, Apache Kafka®, Redis™, and PostgreSQL®?

Given that FerretDB paired with PostgreSQL is a great solution for JSON/documented oriented storage and queries, I wondered how our other managed open source services compared in terms of their JSON-specific functionality. Here’s a brief summary.

Apache Cassandra and JSON

Cassandra is horizontally scalable, and has support for JSON, including nested JSON. However, you can’t perform joins in Cassandra, and that’s a restriction for JSON as well (instead you must design your tables with your queries in mind).

OpenSearch and JSON

OpenSearch is horizontally scalable, and supports JSON/Documents natively, but is focused on search. But it does support a join field type, and the SQL plugin supports complex queries including joins.

Apache Kafka and JSON

Kafka is also horizontally scalable, and doesn’t care about the message format or content, so it handles JSON along with any other message format you can think of. Kafka provides serializers/deserializers for JSON by default. Kafka also provides default JSON SerDes for Kafka Streams. Check out my Kafka Connect pipeline series for examples of building pipelines for JSON data, and the Karapace series which explores using an open source Kafka schema registry with Avro/JSON schemas.

Redis and JSON

Redis is an in-memory database but given that Redis was designed to support a rich variety of datatypes, I was disappointed to see that it doesn’t support JSON in the open source version—you have to use a copy-left plugin (RedisJSON).

PostgreSQL and JSON

As we mentioned above, PostgreSQL has good support for JSON, you can definitely use PostgreSQL to store and retrieve JSON using JSONB/GIN, and it works very well with Apache Superset for visualization as well. But it’s not horizontally scalable for writes.

However, the PostgreSQL vs. MongoDB benchmark we referenced above does give MongoDB the edge for joins, so if you really need JSON joins then it’s worth checking out FerretDB/PostgreSQL. FerretDB is adding new functionality all the time, and I noticed that version 1.1.0 has support for pipeline aggregation which is evidently more efficient than joins.

(Source: Shutterstock)

I initially thought this was a photo of two rocks. But it contains Ferrets (on the rock, in the foreground) with an out-of-focus Elephant (not a rock) in the background.

Based on the nice synergy between FerretDB and PostgreSQL (which has an elephant logo), let’s conclude with a ferret joke:

Question: What do you call a ferret that can pick up an elephant? Answer: Sir!

Endnote

[1] Australia does have more than its fair share of weird and wonderful animals (e.g., the Platypus—a venomous, egg-laying mammal with a duck-like bill, so weird in fact, that English zoologists assumed it was a hoax), but not (natively) Ferrets. Ferrets were introduced in the 1880s as the “goodies” in an attempt to control rabbits, which were introduced earlier for hunting—one of the first of many, inevitably disastrous introduced pests, e.g., Cane Toads, not to be confused with Mr Toad of Toad Hall from “The Wind in the Willows”, who was obsessed with fast cars.

(Source: Shutterstock)

Get in touch to discuss Instaclustr’s managed PostgreSQL service for your application

Read the Q&A with FerretDB

Paul Brebner

Technology Evangelist, Instaclustr

Alexander Fashakin

Technical Writer, FerretDB

Marcin Gwóźdź

Director of Strategic Alliances, FerretDB

Peter Farkas

CEO, FerretDB

How to Use MongoDB® Clients and FerretDB® With Instaclustr for PostgreSQL®

1. What is FerretDB®?

Mind the Gap

2. Down the Ferret Hole With Instaclustr for PostgreSQL

3. Putting FerretDB to Work

4. FerretDB Compared With Apache Cassandra®, OpenSearch®, Apache Kafka®, Redis™, and PostgreSQL®?

Apache Cassandra and JSON

OpenSearch and JSON

Apache Kafka and JSON

Redis and JSON

PostgreSQL and JSON

Endnote

Read the Q&A with FerretDB

About the author

How to Use MongoDB® Clients and FerretDB® With Instaclustr for PostgreSQL®

1. What is FerretDB®?

Mind the Gap

2. Down the Ferret Hole With Instaclustr for PostgreSQL

3. Putting FerretDB to Work

4. FerretDB Compared With Apache Cassandra®, OpenSearch®, Apache Kafka®, Redis™, and PostgreSQL®?

Apache Cassandra and JSON

OpenSearch and JSON

Apache Kafka and JSON

Redis and JSON

PostgreSQL and JSON

Endnote

Read the Q&A with FerretDB

About the author

Get the latest articles for open sourceIn your inbox

Sign upto ourNewsletter

Get the latest articles for open source
In your inbox

Sign up
to our
Newsletter