Technical Friday 1st October 2021

A Brief Guide to NoSQL Databases

By Wade Timmer

SQL vs. NoSQL

NoSQL (Not only SQL) databases are known for being “non-relational” meaning they do not always use the schema of rows and columns found in traditional systems. When compared to SQL databases, NoSQL databases have dynamic schemas for more unstructured data. Large amounts of data can be found with simple lookup queries, depending on whatever type of NoSQL database is being used. Data structures used by NoSQL databases are different from SQL because they are built around the type of data they hold. This can make some operations much faster than a default SQL database.

Why Use NoSQL?

NoSQL databases are most commonly known for their ease of development, performance at scale, and functionality. Flexible data models allow the user to make changes easily as requirements of the database change. Because NoSQL databases allow users to structure data however they like, they are a good fit when quick changes and frequent code pushes are necessary. Scaling to meet traffic demands without downtime is a weakness of SQL databases that is nearly fixed by NoSQL. A strategy known as “scale out strategy” is used by most NoSQL databases to allow the user to see how much scaling is necessary when lots of traffic is expected. By sharding and including a routing layer with capability of redirecting a query to the correct shard, NoSQL databases are not only scalable but fast to query as well. 

Handling data is a main selling point of NoSQL. Multiple queries allow the user to retrieve the desired data with very little computing power lost. It is common to store foreign keys along with the models data, instead of just storing foreign keys. This allows multiple versions of data to be in different places. So, this approach would be used when reads are more common than writes in your system.  

A NoSQL database is able to handle both transactional and analytical workloads. This allows for parts of your application to be easily supported from one database. The most common types of NoSQL database include document, key-value, wide-column stores, or graph, which can leverage a combination of the different types.   

Types of NoSQL Databases

Document

A document database stores its data in JSON, BSON, or XML documents which can be nested together for faster querying. These types of databases are extremely flexible—the document structure can be worked in whichever way the application calls for. This speeds up development as well as queries run on the database. Use cases for document store databases include app development and trading/eCommerce applications.

Examples of document database technologies include Azure Cosmos DB, Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, eXist-db, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, and RethinkDB.

Key-value store

This is one of the most basic types of NoSQL databases. Each element is stored as a key-value pair, where the key is a broad attribute name and the value is a specific element with that attribute. Key-value store examples include Couchbase, Dynamo, FoundationDB, Redis, Riak, SciDB, and Apache ZooKeeper.

Wide column:

Instead of reading data in rows like a relational database, a wide column 

store reads data as a set of columns. When running analytics on a set of columns, the user is able to read those columns without consuming lots of memory. If columns are the same type, efficient compression ensures faster read times. Use cases for wide column databases almost always include analytics. The main issue with these kinds of NoSQL databases is consistency in writes. Writes of all columns require multiple write events on disc. Apache Cassandra is an example of a wide-column database technology.

Graph

The main focus of a graph database is the relationship between data elements. Elements are stored as nodes linked together to form a database. These connections are created to minimize overhead of joining multiple tables in SQL. A main use case of graph databases is social networks. Creating a connection between nodes (users) in the network is a main focus of social media, accomplished efficiently but not entirely by a graph database. Graph database technology examples include Azure Cosmos DB, AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, AgensGraph, OrientDB, and Virtuoso.

History of NoSQL

The term NoSQL was coined in 1998 by Carlo Strozzi when he created one of the first open source relational databases that did not use SQL but was still relational. He said in 2009 that he should have named it NoREL (for non-relational) but it wasn’t until Johan Oskarsson held an event to discuss NoSQL databases and their recent popularity that the name gained traction. NoSQL was originally used as a response to web data and a need for faster processing (for which it still is used).

MongoDB

MongoDB is a NoSQL database that uses JSON-like documents with optional schemas. Its high query performance comes from storing most of the data in RAM instead of grabbing from the hard disk for each query. Some users find the simplicity of the MongoDB query language to be easier to understand than SQL. MongoDB scales horizontally using sharding (like most NoSQL databases) after the user chooses a shard key to determine how data collection will be distributed. This is commonplace for non-relational databases.

A main problem with MongoDB is that it doesn’t support transactions. Applications are requiring transactions less and less each year, but some still need it to update multiple documents/collections at the same time. This could lead to potential data corruption.

DynamoDB

DynamoDB is schemaless, and only the primary key attributes need to be defined at table creation. The primary key can have only one attribute as the primary key and one attribute as the source key. Other NoSQL databases like Cassandra can allow for composite partition keys and multiple clustering columns.

White Paper Download: Managed Cassandra versus DynamoDB

DynamoDB, like most other NoSQL database systems, is simple yet powerful. But, there are some downsides when comparing to a system like Cassandra. DynamoDB is only available through AWS, and Amazon makes a few decisions for you in the setup process. Data is located in a single region and is replicated to three availability zones in that region. Replication to multiple regions is an option, but Amazon Kinesis Data streams must be enabled.

Cassandra

Cassandra has schemas that include structured data, and some NoSQL databases are not always decentralized. It is not always a great fit for an analytics database, so other options should be considered if that is a main priority. With its extreme reliability, no failover is required and it allows for full bidirectional multi-data center support. Also, there are no practical limits to its scalability and scaling can be done in real time.

NoSQL Technology Comparison Table

CassandraDynamoDBMongoDB
ProviderApache Foundation community maintainedAmazon Web Services only, not open sourceProprietary development released under AGPL open source
Primary Use CasesLarge-scale;Structured data store for analytics enginesData structures and key-valued cloud services through AWSFlexible JSON database for rapid development
Data ModelStructured tables but allows for sparse value and multi-value fieldsUses hashing and B-trees to manage data. Data is distributed into different partitions by hashing on the partition keySchema-less JSON
ScalabilityNo practical limits. Operational clusters in the multi-PB rangeAuto-scaling, scaling policy created by the userScales well but requires sharding and is therefore less manageable at a very large scale
ReliabilityExtreme reliability, masterless and replicated. No failover required.Full bi-directional multi-datacenter support.High reliability, automatic replication across multiple Availability Zones in an AWS regionHigh availability with multiple replicas and automated failover
Read/write LatencyTypically 5-15 milliseconds for standard operations. Consistent as dataset growsSingle digit millisecond performance at any scaleSimilar to Cassandra. More complex querying capability can lead to greater variability

Drawbacks of NoSQL Databases

NoSQL databases do not use ACID transactions (atomicity, consistency, isolation, durability), which has been proven to maintain data consistency in SQL databases. NoSQL relies on “eventual consistency” to provide performance advantages, but risks a node going out of sync with the other nodes when data is pulled. Eventual consistency makes sure all nodes are updated eventually (typically within milliseconds) when there is a change. Thus, queries for data might not return updated data right away or might get a result that is not correct. Some NoSQL technologies allow for write-ahead logging, which makes sure data loss is uncommon.

Another drawback of NoSQL databases is the lack of standardization across NoSQL technologies. While this isn’t a design flaw, it just means that the learning curve for NoSQL can be more of a burden than SQL databases. Expertise can be hard to come by, which is why a lot of organizations turn to companies like Instaclustr.

Conclusions

There is no “best” NoSQL database system, just different technologies for different use cases. If you want to compare one technology with another you would need to have a clear understanding of what it’s going to be used for. If you’re searching for the right solution for your data infrastructure, sign up for a free trial of Cassandra on the Instaclustr Managed Platform or reach out for a free consultation with one of our experts