Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

resolve123_170609 avatar image
resolve123_170609 asked Erick Ramirez answered

What is the performance impact of using IN() operator on clustering key?

I have table like

CREATE TABLE calculated (
    forecast_id int,
    month int,
    year int,
    day_of_the_month int,
    management_area_cd text,
    tc text,
    count int,
    value double,
    date text,
    resicount int,
    resiforecast double,
    xhcount int,
    xhforecastvalue double,
    PRIMARY KEY (forecast_id, month, year, day_of_the_month, management_area_cd, tc)
)

i have to query using forecast_id and list of month and year.

so i am creating query using

select * from calculated where where id=1 and month in (1,3,5,6,8) and year =2021 allowing filtering.

I am calling this query for couple of years in a loop.

Please let me know is it the perfect query first we are using IN statement but for same partition also we are using allow filtering.

performancecql
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered

I don't see why there's an id=1 because that's not part of the schema. I think you mean to use:

select * from calculated where where forecast_id=1 and month in (1,3,5,6,8) and year =2021;

The above shouldn't require ALLOW FILTERING because you're supplying the partition key and the IN portion involves the primary key.

For instance:

CREATE TABLE keyspace1.list (
    id text,
    last text,
    first text,
    hobbies list<text>,
    middle text,
    PRIMARY KEY (id, last, first)
);
SELECT * FROM list WHERE id='1' AND last='steve' AND first IN ('lacerda');

That does not require me to use ALLOW FILTERING because I'm supplying the IN clause on a field that's part of the primary key. If you use a field outside of the primary key, then you're filtering, but that's not the case with the above example that you provided.

In any case, any time you have to use ALLOW FILTERING it's a bad idea and I wouldn't recommend it. Here's a good article on why that's so:

https://www.datastax.com/blog/allow-filtering-explained

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

Using the IN() operator on a clustering column where the query is limited to a single partition key is the only valid use of the operator since it is only scanning the rows within the partition. Any other use case is not recommended since it will not perform well.

As Steve Lacerda already mentioned, it is not necessary to specify ALLOW FILTERING in this case. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.