Securing your Apache Cassandra cluster
The new year has begun, and with it has come a wave of database compromises. Cyber-criminals targeting “low hanging fruit” are using public search engines such as Shodan to discover and exploit unsecured MongoDB, Elasticsearch, Hadoop and Cassandra clusters.
The attack is simple but effective: the attacker logs into an unsecured database, exports the data (or simply deletes it), and leaves a ransom note behind demanding payment in Bitcoin in return for the stolen data.
While the majority of the attacks did targeted other databases, there were a number of Cassandra clusters that were hit. For further information on the attacks themselves, there are some excellent writeups by BleepingComputer and Sophos.
How do I prevent my cluster from attack?
- Enable authentication. As authentication is not enabled by default, this is the single most important thing you can do to secure a Cassandra cluster. Simply set the authenticator option to
PasswordAuthenticator
and authorizer option toCassandraAuthorizer
in the cassandra.yaml file to enable password based authentication. If you have a multi-datacenter configuration you must also change the replication class of thesystem_auth
keyspace toNetworkTopologyStrategy
. You should also change the default password (see step 2). If your cluster has datacenters spanning multiple regions, you should also enable SSL. If not then your password will be transmitted in plaintext during authentication and could potentially be intercepted. - The second most important thing you must do is to stop using the default superuser account (the “cassandra” account). Create a new superuser account with a different name and a strong password (and ideally a non-superuser account as well), then set the password for the default superuser to a very long, random string and forget it or lock it away somewhere secure.
- Ensure your JMX ports are not publicly accessible. While it is possible to secure your JMX port, there is rarely a case where it needs to be accessible via a public address. Check your firewall rules and make sure that this port (7199) is not accessible outside of your private network. While you’re at it, check your other firewall rules. In most cases, the only port that you should need to make publically accessible is your internode port – 7001 (use SSL!), and even then only if you have multiple datacentres in multiple regions.
- Don’t use public networks. In the vast majority of cases, if you do not have a multi-datacenter setup as described in point 3, then you do not need to make any Cassandra ports publicly accessible. You will need to modify the listen_address and broadcast_rpc_address configuration options in your cassandra.yaml file to allow your cluster to communicate using only private networks. Cloud provider features such as AWS’s VPC peering and Azure’s Virtual Networks provide flexible methods to connect your application to your cluster privately.
- Enable SSL. If you have a cluster spanning multiple regions or need to connect using public networks, then Cassandra’s SSL feature should be used to protect both your inter-node traffic and client connections. While it can be a little tricky to set up, there are some great step-by-step guides out there to help with this process. It is important to note that enabling SSL alone will not protect your cluster. It must be used in combination with password authentication & authorization. You can optionally enable require_client_auth (which authenticates the client’s SSL certificate when the SSL connection is established) for even more protection.
At Instaclustr, we support all of these configurations for our managed clusters.
That’s great, but what if my Cluster has been hit?
By chance, Cassandra itself actually has some level of built-in protection against this style of attack, as by default it will take a snapshot when a table is dropped. Unless the attacker also has access to Cassandra’s JMX interface or direct access to the OS, they cannot remove the snapshot. Restoring snapshots is a simple task: simply copy the snapshot files to the table’s data directory and use nodetool’s reload command to load the SSTables.
I appreciate the recent focus on security by both Instaclustr and DataStax, but I think both of you guys missed the mark a bit. #1 should be — loud and clear — don’t make C* accessible to the external network under ANY circumstances, PERIOD. Don’t assign public IPs to cluster nodes, and don’t direct traffic in from your firewall. If you’re using AWS, then make sure you’re using a VPC and don’t assign public IPs. The second layer in AWS should be IP/port-based security groups.
If you have a multi-DC setup, then the solution is a VPN, not the public internet + OS level firewall rules + TLS + authentication.
Yes – authentication, TLS, OS-level firewalls, tightening JMX, etc are good additional layers of defense, but #1 priority should be to prevent external network traffic from hitting Cassandra and Linux in the first place. This message should be priority #1, not buried as #4.
Hi Max,
Thanks for your comment and I’d agree – using internal IPs only is the most robust way to avoid external attack issues as it removes the risk should there turn out to be bugs in Cassandra’s implementation of authentication or TLS (including underlying libraries). That said, there are still some circumstances were this is impractical at best and we believe it is possible to achieve a high level of security with public addressing using the range of measure we described.
Cheers
Ben
Thanks for your comment Max. As Ben mentioned, there are a couple of problems with using VPNs for inter-dc Cassandra traffic. Firstly, most VPNs are bandwidth-constrained and do not scale along with the Cassandra cluster. This will eventually lead to replication bottlenecks, which will cause data consistency problems. The second issue is that a VPN introduces a single point of failure to inter-dc replication. While a mesh VPN like tinc could potentially solve these issues, it is more difficult to configure and manage and also contributes more network overhead than Cassandra’s built-in SSL protection. A private network link between data centers would be another good (albeit much more costly) alternative.
While I do agree with you that private networks should be used where possible, properly configured inter-node SSL encryption will provide sufficient protection for most users with large multi-region clusters.
Edit: By the way, I absolutely recommend using AWS Security Groups/Azure firewall rules, or if not on the cloud, a dedicated firewall/gateway rather than relying on OS-level firewall rules.