On July 18th, 2018, Instaclustr co-hosted the NoSQL meetup with JPMorgan Chase in New York City. There was a fantastic turnout — over 125 attendees from various technical backgrounds and industries, including finance, retail and adtech.
To start the evening, an introduction and overview of JPMC’s usage of Cassandra was given by Adam Carson, CTO of the Digital Group at JPMC.
The main presentation, Cassandra and Kubernetes, presented by Instaclustr’s SVP of Engineering and co-founder Adam Zegelin, was well received and prompted a lengthy Q&A session from the attendees, followed by additional one-on-one discussions to close out the evening.
Presented was first a brief overview of Linux containers; Adam discussed their strengths and benefits, how they compare with equivalent and competing process sandboxing technologies from other platforms such as Solaris Zones and FreeBSD jails, and a comparison against complete system virtualization technologies such as KVM or Xen.
Containers on Linux are a combination of various kernel-level APIs and user-space applications that provide an extensible and pluggable framework for process sandboxing. This flexibility makes containers extremely powerful, but also complex to utilise without proper tooling. Solutions such as Kubernetes, Docker, rkt and containerd strive to streamline and simplify the act of running containers on Linux, and bring containers in-line with the first-class sandboxing and virtualization technologies of other platforms.
Containers provide a separation of concerns. They allow you to package, install and run all the libraries, dependencies and userland tools for an application without interference or conflicts from other applications or components. This separation compares to that offered by virtual machines, but with additional flexibility and improved performance.
An introduction to Kubernetes followed, with Adam explaining the fundamentals, and also the merits and benefits of using Kubernetes as a Linux container orchestration and management platform.
The Kubernetes API has become the de-facto container orchestration API. It’s supported in the cloud, with first-class offerings from all the major providers, including AWS EKS, Google Kubernetes Engine and Azure Kubernetes Service. For on-premise installations, Kubernetes can be run directly, or via the numerous third-party compatible implementations and extensions, including Pivotal Kontainer Service and RedHat OpenShift.
Combining containers, Kubernetes and Cassandra, Adam then followed with an in-depth look into Instaclustr’s cassandra-operator, a work-in-progress open source project with a goal to ease the deployment and management of Apache Cassandra on Kubernetes clusters.
Kubernetes has no built-in understanding of the processes and procedures required to deploy, scale and manage a Cassandra cluster. The cassandra-operator extends the Kubernetes API via CustomResourceDefinitions to add first-class support for managing Cassandra clusters. At runtime, the operator coordinates with both the Kubernetes API and Cassandra (via JMX) to provide a seamless integration.
During an initial deployment, the cassandra-operator will automatically manage configuration, Cassandra seed node allocation and node placement (network topology). The cassandra-operator correctly handles bi-directional scaling of Cassandra clusters, correctly adding or decommissioning nodes when required.
Current in-progress and future features include adding monitoring support (via direct integration with Prometheus), automatic backup and restore, scheduled repair, and more.