By Ben Slater Wednesday 19th October 2016

Apache Cassandra and Spark in Financial Services


Financial services industries such as banking, insurance and capital management have been built on data since they first began. From ancient money lenders with clay tablets to early 20th century clerks with rooms full of ledgers to modern high-frequency trading, efficiently recording, retrieving and analysing data has been a key competitive advantage for successful financial services organisations.


In the modern age, the pressures to gain a technological edge in data processing are coming not only from the competition but also from consumers. Gaining a competitive edge requires systems that can collect and quick analyse vast streams of data. Consumers expect that the systems they interact with will be instantly up to date, always available and, increasingly, be aware of the context of all their previous interactions and related information.

Addressing these joint pressures, while containing technology costs, requires the adoption of new generation of architectural patterns and technologies. Apache Cassandra and Apache Spark are two technologies that are ideally placed to form the core of such an architecture. The applicability of these technologies in financial services has been proven many times by leading organisations such as ING and UBS1.

Benefits of Apache Cassandra

Apache Cassandra is a horizontally scalable, highly available open source database system. Its design, when properly implemented,  allows unlimited scalability in terms of volumes of data stored and operations per second served.  Cassandra’s masterless architecture and native support for replication within and across data centers allows organisations to achieve the highest levels of availability while minimise infrastructure and management costs. These characteristics make Cassandra ideally suited as a data store for modern financial applications that must deal with vast streams of data and while being always on.

Some architects and developers that are familiar with Cassandra may have heard that it is an “eventually consistent” database (meaning updates may take some time to be applied to all nodes in a cluster) and be concerned if this is suitable in a financial services setting. However, it is more correct to describe Cassandra as “tuneably consistent”. That is Cassandra allows you optimise the level of investment in redundant infrastructure, availability in the face of failures and consistency guarantees to reach the solution that is the best fit for your use case. Many, if not most, Cassandra applications run with guaranteed consistency.

That said, financial services process such as banking are in fact the original model of eventual consistency. Consider a transaction which is made at a bank branch in one country and then eventually finds its way to the ledger of another bank overseas after a series of batch exchanges. So, financial services solution architects may be well placed to understand the trade-offs of eventual consistency and make good use of this feature in their solutions.

Benefits of Apache Spark

Apache Spark is a modern, in-memory, distributed analytics engine with an architecture informed by big data platforms such as Hadoop MapReduce but designed to build on the experience of those platforms and overcome their limitations. Apache Spark is currently the most active open source project in the big data world 2. The Apache Spark engine is 100 times faster than MapReduce in memory and ten times faster from disk. Spark supports a broad range of analytics capabilities ranging from SQL queries over big data to graph analytics and machine learning. Spark can be operated both in both batch processing and streaming modes and it supports multiple languages for analytics job development.

Best of both worlds

Deploying Spark and Cassandra together gives you the best available analytics capability and the best available data store to store and retrieve your data. The Spark Cassandra connector ensures that the integration of the two technologies is seamless and optimised. The connector is aware of the distribution of data in the Cassandra cluster and so can ensure that analytic operations are performed as close as possible to where the data is stored, maximising performance.


The combined capabilities of Apache Cassandra and Apache Spark provide the basis for financial services organisations to gain competitive advantage through development of a new class of financial services platforms that work with higher volumes and velocities of data than previously possible. This can be achieved while maintaining the highest levels of availability and managing licensing, operations and infrastructure costs.

Instaclustr is a company that specialises in the design, development and operation of big data solutions based on Apache Cassandra and Apache Spark.



Some other articles you might find useful:

Site by Swell Design Group