Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

ashok.dcosta_187920 avatar image
ashok.dcosta_187920 asked ·

Are querying multiple partitions or multiple tables an anti-pattern in Cassandra?

Hi ,

It is a bit confusing from the Datastax documentation what is considered as an antipattern while writing Queries , Please provide the information for below:

Is Querying on multiple partitions considered as anti pattern or just inefficient query pattern ?

Also Multi-table is this an antipattern in Cassandra access pattern in logical data modelling ? Please advise ?

Thanks


data modelinganti-pattern
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

Cassandra anti-patterns are implementations and/or design decisions which are either ineffective (does not perform well, or does not scale) or counter-productive (contrary to the philosophies of C*, or contrary to distributed architectures).

Reading multiple partitions isn't necessarily an anti-pattern but it can affect the performance of the cluster. Consider the scenario where using the IN() operator with multiple partition keys:

SELECT * FROM community.users WHERE pk IN ( 'tom', 'dick', 'harry' )

The coordinator can get overloaded since it needs to fire multiple requests to get the results for just 1 read request. This might be "OK" for retrieving 2 to 5 partitions (still not performant compared to a single read request) but any more than that is bad for the application and bad for the cluster.

In our experience, multi-partition reads do not scale and isn't viable long-term. In most cases, this type of query is not transactional (OLTP) in nature and is indicative of an analytics workload (OLAP). OLAP queries are fine but understand that it doesn't match the performance requirements of OLTP workloads.

Multi-table reads are an anti-pattern in the sense that it implies the need for JOINs and thus, an indication that there is a need to redesign the data model.

One of the primary concepts in C* data modeling is denormalizing. You need to materialize your view by duplicating data in multiple tables so at read time, you are only retrieving data from one table. JOINs are only really suitable for OLAP workloads. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

saravanan.chinnachamy_185977 avatar image
saravanan.chinnachamy_185977 answered ·

@ashok.dcosta_187920 The fundamental design principle of Cassandra is Query first pattern. Right data modeling is the key with performance in Cassandra. So here are the high level steps to approach the data modeling.

  • Identify the queries first. Then organize the data around the query to create your data model.
  • Identify the primary key. This defines the access pattern efficiencies and also the layout of data on disk. Try to distribute the data evenly across the cluster.
  • See if if is possible to satisfy your query with a single partition key. This is the most efficient and preferred way.
  • If your query demands additional data to answer your query, then use clustering columns. The minimum is better. The grouping and ordering of data happens at insertion time through clustering columns. So it is a design decision.
  • Data duplication is ok as we need different tables to satisfy different queries.
  • There are no joins supported and hence no multi table design. Create tables by joining data upfront and create a table to satisfy a query.

For more detailed information please refer to the following.

Data Modeling

Free Data Modeling Course

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hello ,

Thanks for the information , however still have a question , I have gone through the data modelling course , however it is not clear if Querying with Multiple partition is an ANTI Pattern or NOT ?

If I see this documentation below it shows that Querying with multiple partitions

https://www.slideshare.net/chbatey/webinar-cassandra-antipatterns-45996021

In your message above you said

There are no joins supported and hence no multi table design, Is Multi table an anti pattern in cassandra for data modelling ?


thanks


0 Likes 0 · ·