I see this question get asked about once a week so here are my thoughts on the subject.
Background
Central to data modelling in Cassandra is denormalizing data into separate tables so that each application query maps to a table for optimized reads. Back in 2015, Cassandra 3.0 introduced materialized views as an automated way of denormalization so you didn’t have to design and maintain tables manually.
Example
Let’s use the video-sharing site killrvideo.com as an example where we have a table comments_by_video
that stores all the comments posted by users for each video. The table looks like this:
CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
On the site, users can also list all the comments they have posted on various videos. The usual way of modelling for this query is to create a new table called comments_by_user
:
CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
We would batch the updates to both tables to keep them synchronized.
Alternatively, we could create a materialized view based on the table comments_by_video
so that new comments also get posted in the comments_by_user
table automatically. Here is how we do that in CQL:
CREATE MATERIALIZED VIEW comments_by_user AS SELECT videoid, commentid, comment, userid FROM killrvideo.comments_by_video WHERE videoid IS NOT NULL AND commentid IS NOT NULL PRIMARY KEY (videoid, commentid) WITH CLUSTERING ORDER BY (commentid DESC);
Caveat
But there is a catch -- the updates take place asynchronously. In some instances, the view misses the updates and becomes out-of-sync with the base table. In the last few years, we have come to understand that either the design of materialized views, or implementation or both may be flawed. There are conflicting views on the subject but one thing is certain: there is no quick fix if you run into this issue. The only workaround is to drop the view which means an outage to parts of your application until you recreate it.
Recommendation
Personally, I think materialized views are still experimental and you should treat it in the same way. There are some improvements and fixes coming in Cassandra 4.0 but it’s not released yet. If you are already using materialized views successfully then feel free to continue using it. But if you’re just about to jump in, don’t. Stick with designing separate tables.
References
- Docs - Using materialized views @ DataStax Docs
- JIRA - CASSANDRA-13959 Add flag to disable materialized views, and warnings on creation
- Tutorial - Practical Application Data Modeling@ DataStax Academy
- Blog post - New in Cassandra 3.0: Materialized Views @ DataStax Blogs