question

haripriyan.mr_191315 avatar image
haripriyan.mr_191315 asked Erick Ramirez answered

Why are rows not sorted in descending order for the added_date clustering column?

I am shifting from RDBMS world to NoSQL world and hence have few challenges. My requirement is to create a table as

Modify the following command to create a new videos_by_tag table partitioned based on the tag. The table should also store the rows of each partition so that the newest videos are listed first within the partition>.

so I have created the table in the cassandra database as below hope this is correct

CREATE TABLE videos_by_tag (
    tag text,
    video_id timeuuid,
    added_date timestamp,
    title text,
    PRIMARY KEY ((tag), video_id, added_date)
) WITH CLUSTERING ORDER BY (video_id ASC, added_date DESC);

and loaded the data. Select * from videos_by_tag output is

tag       | video_id                             | added_date                      | title -----------+--------------------------------------+---------------------------------+--------------------datastax | 4845ed97-14bd-11e5-8a40-8338255b7e33 | 2013-10-16 00:00:00.00000 |    DataStax Studio  
datastax | 5645f8bd-14bd-11e5-af1a-8638355b8e3a | 2013-04-16 00:00:00.00000 | What is DataStax Enterprise? 
cassandra | 1645ea59-14bd-11e5-a993-8138354b7e31 | 2014-01-29 00:00:00.00000 | Cassandra History 
cassandra | 245e8024-14bd-11e5-9743-8238356b7e32 | 2012-04-03 00:00:00.0000 |  Cassandra & SSDs 
cassandra | 3452f7de-14bd-11e5-855e-8738355b7e3a | 2013-03-17 00:00:00.0000 |  Cassandra Intro

My requirements are:

  1. Is there a way to sort out the column added_date in DESC, though I specified in the DDL as added_date DESC, I could still see the 2nd row under the tag cassandra's added date is older than the 3rd row.
  2. [Posted as question #9333]
  3. [Posted as question #9326]
data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

The rows are in fact sorted in the correct order. If you look closely, data is stored based on the video_id sorted in ascending order then added_date in descending order.

To reiterate, the video_id takes precedence over the added_date.

In the example output you provided:

cassandra | 1645ea59-14bd-11e5-a993-8138354b7e31 | 2014-01-29 00:00:00.00000 | Cassandra History 
cassandra | 245e8024-14bd-11e5-9743-8238356b7e32 | 2012-04-03 00:00:00.0000 |  Cassandra & SSDs 
cassandra | 3452f7de-14bd-11e5-855e-8738355b7e3a | 2013-03-17 00:00:00.0000 |  Cassandra Intro

The video ID from the third row (3452f7de-14bd-11e5-855e-8738355b7e3a) is "older" than the video ID from the second row (245e8024-14bd-11e5-9743-8238356b7e32).

The video_id column is CQL type timeuuid which means they're a type 1 UUID meaning they're UUIDs which include the time they were generated and are sorted by timestamp. The following table shows the equivalent human-readable value of the UUIDs in GMT:

Row video_id Date/time (GMT)
2 245e8024-14bd-11e5-9743-8238356b7e32
June 17, 2015 at 6:50:29 AM
3 3452f7de-14bd-11e5-855e-8738355b7e3a
June 17, 2015 at 6:50:56 AM

Hopefully from this table, you should see that row 2 has a creation time of 6:50:29 which is earlier than 6:50:56 in row 3. For this reason, the rows in the partition are sorted in the correct order. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.