article

Erick Ramirez avatar image
Erick Ramirez posted Erick Ramirez edited

FAQ - Should we reconsider using materialized views?

I see this question get asked about once a week so here are my thoughts on the subject.

Background

Central to data modelling in Cassandra is denormalizing data into separate tables so that each application query maps to a table for optimized reads. Back in 2015, Cassandra 3.0 introduced materialized views as an automated way of denormalization so you didn’t have to design and maintain tables manually.

Example

Let’s use the video-sharing site killrvideo.com as an example where we have a table comments_by_video that stores all the comments posted by users for each video. The table looks like this:

CREATE TABLE comments_by_video (
    videoid uuid,
    commentid timeuuid,
    userid uuid,
    comment text,
    PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);

On the site, users can also list all the comments they have posted on various videos. The usual way of modelling for this query is to create a new table called comments_by_user:

CREATE TABLE comments_by_user (
    userid uuid,
    commentid timeuuid,
    videoid uuid,
    comment text,
    PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);

We would batch the updates to both tables to keep them synchronized.

Alternatively, we could create a materialized view based on the table comments_by_video so that new comments also get posted in the comments_by_user table automatically. Here is how we do that in CQL:

CREATE MATERIALIZED VIEW comments_by_user AS
    SELECT videoid, commentid, comment, userid
    FROM killrvideo.comments_by_video
    WHERE videoid IS NOT NULL AND commentid IS NOT NULL
    PRIMARY KEY (videoid, commentid)
    WITH CLUSTERING ORDER BY (commentid DESC);

Caveat

But there is a catch -- the updates take place asynchronously. In some instances, the view misses the updates and becomes out-of-sync with the base table. In the last few years, we have come to understand that either the design of materialized views, or implementation or both may be flawed. There are conflicting views on the subject but one thing is certain: there is no quick fix if you run into this issue. The only workaround is to drop the view which means an outage to parts of your application until you recreate it.

Recommendation

Personally, I think materialized views are still experimental and you should treat it in the same way. There are some improvements and fixes coming in Cassandra 4.0 but it’s not released yet. If you are already using materialized views successfully then feel free to continue using it. But if you’re just about to jump in, don’t. Stick with designing separate tables.

References

materialized views
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Article

Contributors

Erick Ramirez contributed to this article

Related Articles