This paper describes the process that we follow at Instaclustr to design a Cassandra data model for our customers. While not a prescriptive, formal process it does define phases and steps that our team follows when we are design a new data model for our customers:
- Phase 1: Understand the data
- Phase 2: Define the entities
- Phase 3: Review & tune
As well as defining the process we also provide a worked example based on building a database to store and retrieve log messages from multiple servers.
We recently published a blog post on the most common data modelling mistakes that we see with Cassandra. This post was very popular and led me to think about what advice we could provide on how to approach designing your Cassandra data model so as to come up with a quality design that avoids the traps.
There are a number of good articles around that with rules and patterns to fit your data model into (e.g. http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling and https://academy.datastax.com/demos/getting-started-time-series-data-modeling).
However, we haven’t found a step by step guide to analysing your data to determine how to fit in these rules and patterns. This white paper is a quick attempt at filling that gap.