Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

tarikh avatar image
tarikh asked Erick Ramirez edited

Is it possible to select data from a specific node and then filter by clustering key?

We have a table with below PK

    PRIMARY KEY ((customer_id),type)

we want to select all data related to specific type. select all and then filter by type in code is not efficient. If we have the ability to query specific node, then we can filter by clustering key.

is it a good idea to use allow filtering but combined with "PER PARTITION LIMIT 1"?

select [columns] from keyspace.customers where type='XXX' PER PARTITION LIMIT 1 allow filtering;
data modelingcql
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

Unfortunately your query is inefficient since it requires a full table scan. With a very small data set (a few thousand partitions) and a small number of nodes, the query might complete but it would otherwise either timeout or overload your cluster.

For that query to work, you really need to model the data accordingly. For example:

CREATE TABLE customers_by_type (
    type text,
    customer_id text,
    ...
    PRIMARY KEY (type, customer_id)
)

Alternatively, you can create a secondary index on the type column to get the list of customers. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

thanks for the quick response! :)


Is there any advantage using index instead of new table?

in anyway creating index will create "hidden _idx table", not replicated but still.

I tend to avoid indexing in Cassandra as it goes against Cassandra's nature.

0 Likes 0 ·
You are correct, I personally don't recommend using secondary indexes and would always opt for designing a table that is modelled against the application query for optimum performance. However, I understand the convenience that indexes bring. If you're interested, the guide on When to use an index might be helpful. Cheers!
1 Like 1 ·