Apache Cassandra Materialized Views
Materialized views are a feature, first released in Cassandra 3.0, which provide automatic maintenance of a shadow table (the materialized view) to a base table with a different partition key thus allowing efficient select for data with different keys.
Many Cassandra users will be aware that the Apache Cassandra project recently made the decision to mark materialized views as experimental beginning from Cassandra 3.0.16 and 3.11.2 (for further details see https://mail-archives.apache.org/mod_mbox/cassandra-user/201710.mbox/%3CetPan.59f24f38.438f4e99.74dc%40apple.com%3E and https://issues.apache.org/jira/browse/CASSANDRA-13959).
As this move may cause concern to users who are already using materialized views, this post provides our recommendations for those users and clarifies our position on materialized views for Instaclustr managed service and support customers.
Instaclustr and Materialized Views
Materialized views have been around for some time and, in our observation, are reasonably widely deployed in recently developed Cassandra applications. Quite a number of issues have been found through these initial deployments, many of which have been fixed in recent releases of Apache Cassandra.
However, these deployments have also highlighted some fundamental issues with materialized views which were highlighted in the decision to move them to experimental status:
- There is no in-built method for reconciling the materialized view with the base table (which should not matter if everything functions as expected but, in a complex distributed system, would be a valuable safety net).
- If you do find differences between the materialized view and base table, there is no in-built method for re-synchronizing the view with the base table other than dropping the materialized view and recreating.
- There are no strong guarantees on the time for updates to the base table to be reflected in materialized views (which is inherited from the logged batch mechanism that materialized views are build on).
Users with a need to retain copies of their data with an alternate partition key structure are therefore left with basically two choices:
- adopt MVs with these known limitations and develop their own work-arounds (i.e. reconciliation processes) or accept the associated risks; or
- fall back to using application code to maintain multiple views of the data (which will likely still require the development of reconciliation tools).
The move of materialized view to an experimental state does highlight the risk (that exists with any software) that there are other, currently unknown issues.
However, in recent versions many of the known issues have been fixed, and with some care materialized views are being used successfully without major issues. The section “Recent Fixes and Specific Considerations” below sets out these fixes, some remaining known edge cases and also considerations around repairs.
What We Think About Materialized Views
Instaclustr’s position on support of materialized view for our managed service and support customers is as follows:
- We will support materialized views within the known functional limitations set out in this post.
- We recommend that you explicitly test the correctness of materialized views for your application scenarios, including under load (do not assume correctness).
- We recommend that you develop reconciliation checking tools to check the correctness of your materialized views against your base tables and run these regularly in production. (Any identified issues can likely be manually fixed by upserting to the base table, tools may be developed for this if required.)
- Avoid using incremental repairs with materialized views.
- Ensure you follow Cassandra data modelling best practice and consider partition sizes for both the base table and materialized view.
- Do not create a materialized views with filtering on a non-primary key column (now disabled by default).
We appreciate that it is undesirable for functions to be released like this when they are not production ready. We have been heartened to see the Cassandra project move to a higher bar for quality and a greater focus on stability in recent times and see this clarification of the status of materialized views as a positive move in that regard.
In addition to the Cassandra project’s moves, Instaclustr has commenced steps to develop a certification process for versions of Cassandra that we support which will provide a documented level of testing and results in addition to the project’s testing as well as a guidance on the maturity and level of support for versions and new features. We expect to release this process in Q1 2018.
Should you have any questions regarding this material please contact [email protected].
Recent Fixes and Specific Considerations
Recent Fixes
In 3.11.1 a number of cases were fixed that resulted in inconsistent data between the base and the materialized view. These consisted of issues relating to TTL’s, the use of TIMESTAMP, using an additional non-primary key column in the primary key of the materialized view, deletions, and filtering on non-partition key columns in the view. Following is a list of issues fixed, note that most of these were fixed together in CASSANDRA-11500.
- Range tombstones created prior to the data they shadow will not delete the data in the materialized view – CASSANDRA-13787
- DELETE of unselected column/collection should not affect ordered updates – CASSANDRA-13127
- Unselected columns should keep the materialized view row alive when other columns expire – CASSANDRA-13127
Specifically affecting materialized views with an extra non-PK column in the view PK
- View row should expire when view PK column expires in base – CASSANDRA-13657
- Commutative row deletion – CASSANDRA-13409
- Out of order updates to extra column on view PK – CASSANDRA-11500
Edge Case Issues
There were also consistency issues related to filtering in the materialized view against non-primary key columns (e.g: CREATE MATERIALIZED VIEW AS SELECT * WHERE enabled = True) that could result in inconsistent data between base and the materialized view. This case was unable to be fixed without a large storage re-write which cannot happen until 4.0, so has been blocked by default in 3.11.1. There is a JVM parameter you can pass in to re-enable this functionality, however you should understand potential implications of using materialized views in this way (-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe). More information can be found in CASSANDRA-13798 and CASSANDRA-13547.
We recommend against creating a materialized view with filtering on a non-primary key column. If you have already started with this use case or absolutely need to do it, you should continue only if you intend to stick to a write-once pattern for the base table. Updating non-primary key columns with a filter on a non-PK base column will inevitably lead to inconsistent data between materialized view and base. The typical scenario is that after multiple updates to the filtered column the materialized view row will disappear. The following example provides a better idea of the problem. The simplest way to avoid this problem is with a write-once pattern to the base table, with no updates or manual deletions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
CREATE TABLE test.base (k int PRIMARY KEY, a int, b int, c int); CREATE MATERIALIZED VIEW vw AS SELECT * FROM test.base WHERE k IS NOT NULL AND a IS NOT NULL AND c = 1 PRIMARY KEY (k, a) INSERT INTO test.base (k, a, b, c) VALUES (1,1,1,1) SELECT * FROM test.vw; k | a | b | c ---+---+---+--- 1 | 1 | 1 | 1 (1 rows) UPDATE test.base SET c=2 WHERE k=1; SELECT * FROM test.base; k | a | b | c ---+---+---+--- 1 | 1 | 1 | 2 (1 rows) SELECT * FROM test.vw; k | a | b | c ---+---+---+--- (0 rows) UPDATE test.base SET c=1 WHERE k=1; UPDATE test.base SET c=1 WHERE k=1; SELECT * FROM test.base; k | a | b | c ---+---+---+--- 1 | 1 | 1 | 1 (1 rows) SELECT * FROM test.vw; k | a | b | c ---+---+---+--- |
(0 rows)
The view row is now dead but should be alive.
Another specific case to be aware of is the deletion of columns not selected in the materialized view. This scenario may result in cases where the deletion is not properly reflected in the view. At the moment the only proven case of this is when deletions pre-3.11.1 are propagated after upgrading to 3.11.1 using repairs or hints. This is low risk but still a possibility, and in which case we recommend avoiding deletions on columns not included in the select clause of the view. For example, the following queries should be avoided in the given base table below:
1 2 3 4 5 6 7 8 9 |
CREATE TABLE test.base (k int PRIMARY KEY, a int, b int, c int); CREATE MATERIALIZED VIEW vw AS SELECT k, a, b FROM test.base WHERE k IS NOT NULL AND a IS NOT NULL AND c IS NOT NULL PRIMARY KEY (k, a) INSERT INTO test.base (k, a, b, c) VALUES (1,1,1,1); DELETE c FROM test.base where k=1; |
Other Considerations
Other existing issues exist that mostly revolve around poor data models that result in very large partitions. The batchlog and write path are currently incapable of handling views with very large partitions.
Partition deletions that will affect a large number of view primary keys will generate a single mutation (write) which may exceed limits such as max_mutation_size (default 16MB) or the max_value_size (default 256MB). If you hit one of these errors you may not effectively delete the relevant rows in the view.
The easiest way to avoid this issue is to avoid poor view data models that would result in very large partitions or wide rows.
You should also be aware of some issues with repairs. Firstly you should avoid incremental repairs against MV’s, and stick to full repairs only (CASSANDRA-12888).
Secondly, to avoid inconsistencies created in the view you should ensure you repair the base table first, and then follow up by repairing the view, as certain combinations of inconsistencies across the nodes could result in a repair bringing back data in the view (CASSANDRA-13073).
As always, we recommend testing your views in the same way you would test a normal table. Ensure you’ve tested and verified all your operations before using in production. Be sure to test repair as well and ensure your repairing strategy will work with materialized views.