Technical — Cassandra Monday 23rd August 2021

Full Query Logging With Apache Cassandra 4.0

By Shelby Carpenter

The release of Apache Cassandra 4.0 comes with a bunch of new features to improve stability and performance. It also comes with a number of valuable new tools for operators to get more out of their Cassandra deployment. In this blog post, we’ll take a brief look at full query logging, one of the new features that comes with the Cassandra 4.0 release

What Are Full Query Logs?

First off, we need to understand what counts as a full query log (FQL) in Cassandra. Full query logs record all successful Cassandra Query Language (CQL) requests. Audit logs (also a new feature of Cassandra 4.0), on the other hand, contain both successful and unsuccessful CQL requests. (To learn about the different forms of logging and diagnostic events in Cassandra 4.0, check out this blog by Instaclustr Co-Founder and CTO Ben Bromhead.)

The FQL framework was implemented to be lightweight from the very beginning so there is no need to worry about the performance. This is achieved by a library called Chronicle Queues, which is designed for low latency and high-performance messaging for critical applications.

Use Cases for Full Query Logs

There are a number of exciting use cases for full query logs in Cassandra. Full query logs allow you to log, replay, and compare CQL requests live without affecting the performance of your production environment. This allows you to:

  • Examine traffic to individual nodes to help with debugging if you notice a performance issue in your environment 
  • Compare performance between two different versions of Cassandra in different environments
  • Compare performance between nodes with different settings to help with optimizing your cluster
  • Audit logs for security or compliance purposes

Configuring Full Query Logs in Cassandra

The settings for full query logs are adjustable either in the Cassandra configuration file, cassandra.yaml, or with nodetool.

See the following example from the Cassandra documentation for one way to approach configuration settings:

In this case, you would just need to add an existing directory that has permissions for reading, writing, and execution to log_dir. The log segments here are rolled hourly, but can also be set to roll daily or minutely

The max_queue_weight, which sets the maximum weight for in-memory queue records waiting to be written prior to blocking or dropping, is set here to 268435456 Bytes (equivalent to 256 MiB). This is also the default value. 

And the max_log_size option, which sets the maximum size of rolled files that can be retained on the disk before the oldest file is deleted, is set here to 17179869184 Bytes (equivalent 16 GiB).

After configuring the settings for your full query logs in cassandra.yaml, you can execute this command using nodetool to enable full query logging:

You must do this on a node-by-node basis for each node where you want to have full Cassandra query logs. 

If you prefer to, you can also set the configuration details for the full query logs within the syntax of the nodetool enablefullquerylog. To learn how to do so, check out the Cassandra documentation. 

With the great power of the FQL framework to log all your queries, you might wonder what happens when you log a statement that contains some sensitive information in it. For example, if an operator creates a new role and specifies a password for the role, will the password be visible in the log? This seems like it would be a sensitive security issue. 

The answer is that, no, there will not be any passwords visible. The Cassandra implementation is quite aggressive when it comes to the obfuscation of queries containing passwords in it by obfuscating the remaining part of statements when it finds passwords in them. It would also obfuscate passwords in case a query with passwords is not successful, which might happen when an operator makes a mistake in CQL query syntax.

From the point of view of the observability, you get the status of FQL by the respective getfullquerylog nodetool subcommand and you can disable FQL in runtime by disablefullquerylog subcommand. Of course, you achieve the same with calling respective JMX methods on StorageService MBean.

How to View Full Query Logs

Now that you have your full Cassandra query logs configured correctly and enabled for your chosen nodes, you probably want to view them sometimes. That’s where the fqltool command comes in. fqldump allows you to view logs (converted from binary to a format understandable to us humans). fqlreplay will replay logs, and fqlcompare outputs any differences between your full query logs. 

Together, fqlreplay and fqlcompare let you revisit different sets of production traffic to help you analyze performance between different configurations or different nodes, or to help with debugging issues.

Conclusion

With enhancements for stability and performance, along with cool new features like live full query logging, Cassandra 4.0 is a big step forward for the Cassandra community. To start using Cassandra 4.0, sign up for your free trial of Instaclustr Managed Cassandra and select the preview release of Cassandra 4.0 as your software version.