Apache Cassandra 4.0 brings about a long awaited feature for tracking and logging database user activity. Primarily aimed at providing a robust set of audit capabilities allowing operators of Cassandra to meet external compliance obligations, it brings yet another enterprise feature into the database. Combining work for the full query log capability, the audit log capability provides operators with the ability to audit all DML DDL and DCL changes to either a binary file or a user configurable source (including the new Diagnostics notification changes).
This capability will go a long way toward helping Cassandra operators meet their SOX and PCI requirements. If you are interested in reading about the development of the feature you can follow along here: https://issues.apache.org/jira/browse/CASSANDRA-12151
From a performance perspective the changes appear to only have a fairly minor hit on throughput and latency when enabled, and no discernible impact when disabled. Expect to see 10% to 15% impact on mixed workload throughput and p99 latency.
By default audit logs are written in the BinLog format and Cassandra comes with tools for parsing and processing them to human readable formats. Cassandra also supports executing an archive command for simple processing of audit logs. Audited keyspaces, users, and command categories can be whitelisted and blacklisted. Audit logging can be enabled in cassandra.yaml.
What’s the Difference Between Audit Logging, Full Query Logging and Diagnostic Events?
Both Audit logging (BinAuditLogger) and Full Query logging are managed internally by Apache Cassandra’s AuditLogManager. Both implement IAuditLogger, but are predefined in Apache Cassandra. The main difference is that the full query log receives AuditLogEntries before being processed by the AuditLogFilter. Both the FQL and BAL leverage the same BinLog format and share a common implementation of it.
Diagnostic events are effectively a queue of internal events that happen in the node. There is an IAuditLogger implementation that publishes filtered LogEntries to the Diagnostics queue if users choose to consume audit records this way.
So think of it this way: Cassandra has an audit facility that enables both configurable audit on actions as well as a full query log, you can have as many AuditLoggers enabled as you want. Diagnostic events is a way for pushing events to client drivers using the CQL protocol and you can pipe AuditEvents to the Diagnostics system!
How Is This Different From Cassandra’s Change Data Capture(CDC) Mechanism?
Apache Cassandra has supported CDC on tables for some time now, however the implementation has always been a fairly low level and hard to consume mechanism. CDC in Cassandra is largely just an index into commitlog files that point to data relevant to the table with CDC enabled. It was then up to the consumer to read the commitlog format and do something with it. It also only just captured mutations that were persisted to disk.
Audit logging capability will log all reads, writes, login attempts, schema changes etc. Both features could be leveraged to build a proper CDC stream. I would hazard a guess that it’s probably easier to do with the IAuditLogger interface than consuming the CDC files!