Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

singhavnishsingh avatar image
singhavnishsingh asked Erick Ramirez edited

Why can't I select the last n rows from a table?

I have a table in cassandra now i cannot select the last 200 rows in the table. The clustering order by clause was supposed to enforce sorting on disk.

CREATE TABLE t1(id int , 
event text, 
receivetime timestamp ,
PRIMARY KEY (event, id)
) WITH CLUSTERING ORDER BY (id DESC)
;

The output is unsorted by id:

event   | id | receivetime
---------+----+---------------------------------
  event1 |  1 | 2021-07-12 08:11:57.702000+0000
  event7 |  7 | 2021-05-22 05:30:00.000000+0000
  event5 |  5 | 2021-05-25 05:30:00.000000+0000
  event9 |  9 | 2021-05-22 05:30:00.000000+0000
  event2 |  2 | 2021-05-21 05:30:00.000000+0000
 event10 | 10 | 2021-05-23 05:30:00.000000+0000
  event4 |  4 | 2021-05-24 05:30:00.000000+0000
  event6 |  6 | 2021-05-27 05:30:00.000000+0000
  event3 |  3 | 2021-05-22 05:30:00.000000+0000
  event8 |  8 | 2021-05-21 05:30:00.000000+0000

The keyspace has a replication factor of 1

Please help

Thanks.

cql
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

The rows within a partition are sorted based on the order of the clustering column, not the partition key.

In your case, the table's primary key is defined as:

    PRIMARY KEY (event, id)

This means that each partition key can have one or more rows, with each row identified by the id column. Since there is only one row in each partition, the sorting order is not evident. But if you had multiple rows in each partition, you'd be able to see that they would be sorted. For example:

 event   | id | receivetime
---------+----+---------------------------------
  event1 |  7 | 2021-05-22 05:30:00.000000+0000
  event1 |  5 | 2021-05-25 05:30:00.000000+0000
  event1 |  1 | 2021-07-12 08:11:57.702000+0000

In the example above, the partition event1 has 3 rows sorted by the ID column in reverse order.

In addition, running unbounded queries (no WHERE clause filter) is an anti-pattern in Cassandra because it requires a full table scan. If you consider a cluster which has 500 nodes, an unbounded query has to request all the partitions (records) from all 500 nodes to return the result. It will not perform well and does not scale. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.