Financial services industries such as banking, insurance and capital management have been built on data since they first began. From ancient money lenders with clay tablets to early 20th century clerks with rooms full of ledgers to modern high-frequency trading, efficiently recording, retrieving and analysing data has been a key competitive advantage for successful financial services organisations.
In the modern age, the pressures to gain a technological edge in data processing are coming not only from the competition but also from consumers. Gaining a competitive edge requires systems that can collect and quickly analyse vast streams of data. Consumers expect that the systems they interact with will be instantly up to date, always available and, increasingly, be aware of the context of all their previous interactions and related information.
Addressing these joint pressures, while containing technology costs, requires the adoption of new generation of architectural patterns and technologies.
Cloud-based, open source solutions are increasingly being used to help financial services organisations meet these demands by breaking the constraints of legacy systems such as:
- providing true always-on services;
- serving high transaction (particularly read) volume uses cases at a reasonable cost;
- extreme scalability and latency requirements; and
- enabling analytics and AI-driven innovation.
At Instaclustr, we are specialists in three open source technologies that are commonly used in these scenarios: Apache Cassandra, Apache Spark and Apache Kafka. This paper examines the advantages and typical use of these technologies in financial services architectures.
Benefits of Apache Cassandra
Apache Cassandra is a horizontally scalable, highly available open source database system. Its design, when properly implemented, allows unlimited scalability in terms of volumes of data stored and operations per second served. Cassandra’s masterless architecture and native support for replication within and across data centers allows organisations to achieve the highest levels of availability while minimise infrastructure and management costs. These characteristics make Cassandra ideally suited as a data store for modern financial applications that must deal with vast streams of data and while being always on.
Some architects and developers that are familiar with Cassandra may have heard that it is an “eventually consistent” database (meaning updates may take some time to be applied to all nodes in a cluster) and be concerned if this is suitable in a financial services setting. However, it is more correct to describe Cassandra as “tuneably consistent”. That is Cassandra allows you optimise the level of investment in redundant infrastructure, availability in the face of failures and consistency guarantees to reach the solution that is the best fit for your use case. Many, if not most, Cassandra applications run with guaranteed consistency.
That said, financial services process such as banking are in fact the original model of eventual consistency. Consider a transaction which is made at a bank branch in one country and then eventually finds its way to the ledger of another bank overseas after a series of batch exchanges. So, financial services solution architects may be well placed to understand the trade-offs of eventual consistency and make good use of this feature in their solutions.
Benefits of Apache Spark
Apache Spark is a modern, in-memory, distributed analytics engine with an architecture informed by big data platforms such as Hadoop MapReduce but designed to build on the experience of those platforms and overcome their limitations. Apache Spark is currently the most active open source project in the big data world2. The Apache Spark engine is 100 times faster than MapReduce in memory and ten times faster from disk. Spark supports a broad range of analytics capabilities ranging from SQL queries over big data to graph analytics and machine learning. Spark can be operated both in both batch processing and streaming modes and it supports multiple languages for analytics job development.
The Spark analytic engine can be deployed both as an interactive tool for use by data analysts (likely with the assistance of a notebook style UI such as Apache Zeppelin) or embedded into an application architecture for automated analytic task such as fraud detection.
Benefits of Apache Kafka
Apache Kafka is a queuing and streaming platform based on similar architectural patterns to Cassandra and Spark to again provide levels of scalability and availability that cannot be achieved with traditional monolithic architectures.
Kafka can be used in your architecture:
- as a message bus – provide loose coupling between producers and consumers of messages;
- as a store of logical transactions for populating a analytical data stores or edge caches;
- as a buffer to manage back pressure is systems subject to workload spikes; and
- (along with Kafka Streams or Spark Streaming) as the basis of a streaming architecture for real-time analytics.
Together or separately, these leading open source products can all help financial services companies to meet the growing data and analysis demands of their business while containing and even reducing costs. Key examples of areas where we see the open source technologies deployed in financial services include:
- data caches supporting consumer banking apps and open banking APIs;
- consumer insight applications;
- fraud detection analytics; and
- core architecture components for fintech companies (transaction database, enterprise message bus, etc).
Instaclustr, through our Managed Service, Enterprise Support and Consulting services has helped many companies in financial services and other industries achieve these benefits of open source.