• Apache Cassandra
  • Technical
Big Data Challenge

The Big Data Challenge

Avoiding the challenges and pitfalls of Apache Cassandra implementation

Executive summary

Big Data is a different problem that requires a different data storage and management solution. It’s data that is too big, moves too fast, and doesn’t fit standard relational data structure.

Apache Cassandra has emerged as a winning solution for handling Big Data.

However, the right deployment strategies and best practices can mean the difference between on-time deployment of applications that scale massively, are always available, and perform blazingly fast, and those that bring your applications to a crawl.

Major parts

The paper is divided into five major parts:

  • Part 1 discusses the new challenges of 21st century computing—that of dealing with Big Data, and how Big Data is different from any former data management challenges faced by technologists. Companies don’t only have to deal with this new challenge they will have to deal with it in the face of significant shortage of resources skilled in dealing with Big Data.
  • Part 2 briefly discusses how and why Cassandra was born, and what makes Cassandra the preferred choice for handling big data challenges over other alternatives such as key value, extensible record, and scalable relational data stores. This section also briefly discusses a benchmark test conducted by the University of Toronto team of data scientists and their findings showing that Cassandra was the clear winner for write intensive applications.
  • Part 3 discusses four common pitfalls and mistakes that technologists make when implementing Cassandra—mistakes made because engineers approach Big Data coming from their relational database background.
  • Part 4 discusses the recommended Best Practices and strategies for deploying, configuring, monitoring, and maintaining Cassandra—getting it up and running in production in a matter of weeks rather than months, at less than 20% of your typical costs, while avoiding very real risks of wrongly deploying Cassandra that could bring your applications to a crawl.
  • Part 5 provides a case study that illustrates the power of Cassandra when implemented correctly.