Q&A with FerretDB

Recently I managed to track down some of the key people behind FerretDB, and they kindly offered to answer some questions I had about FerretDB, the software and the project.

Below is my Q&A I conducted with Alexander Fashakin (Technical Writer, FerretDB), Peter Farkas (CEO, FerretDB) and Marcin Gwóźdź (Director of Strategic Alliances, FerretDB).

A Ferret in the Wild (Source: Shutterstock)

Why the name “Ferret”?

There is a long-standing connection between open source data products and animals. Think Slonik, the PostgreSQL elephant, the MySQL dolphin, Hadoop, the yellow elephant, the MariaDB seal. CockroachDB! “To ferret out” means to find something after careful searching. Also, ferrets are fun!

The FerretDB Logo features a stylized Ferret.

How does FerretDB work?

FerretDB is an open source proxy that translates MongoDB wire protocol queries to SQL, with PostgreSQL as the database engine. With FerretDB, users can run the same MongoDB protocol queries without learning a new language or command.

Where did you get the idea/motivation for developing FerretDB?

The motivation was that MongoDB switched from open source software to an SSPL-licensed company. It was easy to use with extensive documentation, making it a top choice for developers looking for an open source database. We wanted to fill that gap by providing a still amazing tool for the community with FerretDB.

However, the idea to launch the company came after our CEO, Peter Farkas, and Co-Founder, Peter Zaitsev, were hiking in the base camp of K2 in the Himalayas. After days of talking, they ran out of topics and driven by the uplifted motivation of hiking one of the planet’s highest peaks, realized that they needed to start a company to fill the gap that MongoDB left in the open source community.

K2 provided the inspiration for FerretDB! (Source: Shutterstock)

Who else is involved in FerretDB?

We currently have a distributed team of 10 people, mostly database engineers and open source experts. The founding team was Percona’s CEO, Peter Zaitsev, Percona’s senior executive, Alexey Palazchenko, and Percona’s service director, Peter Farkas.

What license does FerretDB have?

FerretDB is licensed under the Apache License 2.0—an open source license.

Why did you build a proxy for MongoDB specifically? MongoDB is a JSON/document database I believe?

Correct. Document databases can collect, store, and retrieve data in various data types. In FerretDB, data is stored as BSON—a binary representation of JSON – so you can store more data types than in regular JSON.

FerretDB is a proxy that uses PostgreSQL as a backend. The proxy translates MongoDB wire protocol commands into SQL queries and uses PostgreSQL as storage. This way, MongoDB drivers and even tools can be used with an application that would otherwise rely on MongoDB.

Why did you choose PostgreSQL for the backend SQL database engine? (SQL and JSON are somewhat different?)

We chose PostgreSQL due to their open source nature, a large community of users, and also the massive amount of resources available. The difference, in this case, is that we actually convert from BSON (which stands for Binary JSON—a structure provided by MongoDB) to JSONB on PostgreSQL. Our blog post here explains how this works in practice.

Who owns the MongoDB wire protocol/queries?

The MongoDB wire protocol and the query language were developed by MongoDB. Since then, many alternatives have been created, such as Amazon DocumentDB, Azure CosmosDB, or FerretDB. For now, these alternatives need to follow MongoDB’s direction in terms of features and compatibility.

However, we are working with the industry to make the query language an open standard, which will definitely change this dynamic. SQL was IBM’s invention, and later it became an open standard utilized by all relational databases—a great analogy for what we think will happen with the MongoDB query language and document databases. This is what we are working on.

The Australian Overland Telegraph Line was completed in 1872 and connected Australia to Europe via thousands of km of overland wires and submarine cables using telegraphy – these original poles were repurposed for telephone (Source: https://commons.wikimedia.org/wiki/File:Adelaide-Darwin_Telegraph_Line.jpg)

What were some of the challenges and solutions around transformations from MongoDB to SQL?

There were quite a lot of challenges. It can sometimes be tricky managing the logic differences from BSON to JSONB and how data types are handled, data sorting, field ordering, etc. The blog post actually explains a lot of what goes into how we use our in-house mapping system to achieve a lot of this.

Are there other clever things going on?

We are working on several updates to make FerretDB more accessible and useful for the community. One of them is the progress to support SQLite database backend. The potential implementation of SQLite reflects our goal to provide an open source database alternative to MongoDB and enable more database backend support beyond PostgreSQL.

We are also working on several partnerships to include them in the mix of supported tools and solutions.

What are some use cases for FerretDB?

FerretDB is ideal for document database use cases similar to MongoDB—applications that require high flexibility and scalability, and even users who want more advanced features of a relational database. In short, it’s perfect for any applications currently using MongoDB but require an open source compatible alternative, or people with experience maintaining or managing PostgreSQL databases.

Does FerretDB introduce much extra latency between MongoDB clients and PostgreSQL?

There are some complexities and logic associated with mapping commands to Postgres especially due to differences in data types, sorting, and many other things we need to consider when implementing them. So, it’s understandable there is a bit of latency but for now, our foremost focus is on enabling more compatibility.

However, we understand that we will need to depart from this approach to increase performance by creating our own extension or through other methods. To address that, we are continually enabling more query pushdown to the backend.

From what I know about MongoDB it’s designed to be horizontally scalable—PostgreSQL is only vertically scalable (for writes at least). So, what are the implications of using PostgreSQL as the backend for scalability?

For now, this is one of the questions we are researching and looking at the current use cases too. While we are not focusing on scalability and performance for now, we understand it’s an important aspect of any database and its currently lined up in our roadmap.

How would you typically deploy and scale FerretDB for production?

FerretDB is a stateless application with two components—PostgreSQL as a backend plus FerretDB wire protocol implementation. So, you need to decide the way you want to run PostgreSQL since running FerretDB is like running any other stateless database.

I noticed that the “change stream” isn’t implemented yet (for watches, and maybe Kafka connect source connectors). Is this possible to support on PostgreSQL I wonder? How easy will it be to implement?

ChangeStreams (and also OpLog) are just one of the issues we’re actually looking at in-house due to their importance—it’s important for compatibility with a couple of applications such as Meteor.js applications. The challenge is that we still don’t really understand how they work, and we need to understand how they work, and the technical complexities involved to implement them.

Meteors come from outer space—if they hit the ground, they are called meteorites! meteor.js connects to MongoDB. (Source: Shutterstock)

In Meteor’s case and maybe some other applications, it may be possible to fall back to polling instead, which should still work almost the same, but it’s something we’re looking at resolving.

I’m always interested in performance and scalability—have you run any benchmarks on FerretDB? Are there any comparison benchmarks?

The aim of FerretDB, for now, has been to enable more compatibility, so we haven’t really focused a lot on performance or scalability (this is currently in our roadmap, and we’ll soon have more information for you as we look into improving FerretDB). For now, though, you can have a look at David Murphy’s webinar where FerretDB, DocumentDB, CosmosDB, and MongoDB were compared across various benchmarks, including performance and cost. Here’s the video.

What else is new?

Here’s our roadmap and a blog post about the latest release.

Paul Brebner

Technology Evangelist, Instaclustr

Alexander Fashakin

Technical Writer, FerretDB

Marcin Gwóźdź

Director of Strategic Alliances, FerretDB

Peter Farkas

CEO, FerretDB