Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

XXSD avatar image
XXSD asked ·

How do I sort the result set based on clustering column?

Hi, I'm trying to order all tables based on the timestamp desc but it's not working properly. Here is my code for creating the table and the result I'm getting:

CREATE TABLE IF NOT EXISTS ${cassandraKeyspace}.${cassandraTableName} (
                    ${cassandraDateColumn} timestamp,
                    month_year_index text,
                    dataMap map<text, float>,
                    PRIMARY  KEY ((month_year_index), ${cassandraDateColumn})
                ) WITH CLUSTERING ORDER BY (${cassandraDateColumn} DESC)`

1620515707582.png

These rows that I inserted to test should be at the bottom of the table and the most recent ones should be at the top. What do I need to change to make this happen? It is ordering but not properly, maybe something to do with the primary key?

cassandra
1620515707582.png (40.0 KiB)
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

The clustering column identifies rows within a partition. The CLUSTERING ORDER BY directive defines how the rows will be sorted on disk.

Don't confuse the clustering order of rows in a partition to be the same ordering of data across all partitions.

The example output you posted indicates that you're doing a multi-partition read which is a bad idea since it will result in a full table scan. In any case, the clustering order doesn't apply to all partitions in the table -- it only applies to the clustering column of the table. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered ·

@XXSD, since the image doesn't contain the column headers, I am assuming that the 2nd column is equivalent to your ${cassandraDateColumn} column.

By looking at the image, it is correctly ordering the data within the partition month_year_index with ${cassandraDateColumn} column in the way you've wanted it to order, i.e. you've created it as ${cassandraDateColumn} DESC in your table construct and the values are properly ordered. 2021-04-26 13:12:17 appears at the top when compared to 2021-04-26 13:12:07 in descending order.

Unsure if I am missing anything in here, but feel free to update you original question with elaborate details on what you're trying to achieve in here with your requirements in case you're looking for something else. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I wanted the full table to be ordered by the timestamp but I recently learned that the clustering order only applies to a partition and not the full table but please correct me if I'm wrong

0 Likes 0 ·

@XXSD, you're correct. Clustering is a storage engine process that sorts data within each partition based on the definition of the clustering columns

1 Like 1 ·