question

patrickjp93 avatar image
patrickjp93 asked Erick Ramirez commented

Is selecting the nth element in a CQL list supported?

I know this seems like a "Read the Manual" Question, but I've read the manual and searched with Google to no avail.

Does cassandra support selecting the nth row from a collection, ie without matching an inner key?

So if I have multiple users under an account in my app, and if I want to open a detailed view of ONE user, is there a proper way to write the following query?

SELECT userList[n] FROM Accounts WHERE Id = 123456...; 

Or do I just need to bite the bullet and include a 'uid' column in my User type?

SELECT * FROM Accounts.userList WHERE Id = 123456 and userList CONTAINS { uid : n }
collections
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

It isn't possible to select an element from a CQL list because collections are not indexed internally. The entire collection is read in order to access a single element.

For clarification, an element in a collection does not map to a row in a partition.

If you want to store rows of users in a partition, you need to define the users column as a clustering column instead of storing them in a collection. For example:

CREATE TABLE accounts (
  account_id int,
  user text,
  ...
  PRIMARY KEY (account_id, user)
)

With this table definition, you can store rows of users for each account and be able to run a query similar to the second one you posted:

SELECT * FROM accounts WHERE account_id = ? AND user = ?

Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

patrickjp93 avatar image patrickjp93 commented ·

Hello Erick,

Thank you for the reply. Bear in mind I haven't dived very deep on the data storage model. I do understand every update is a full rewrite, optimization toward sequential operations rather than random, and probably the bare basics. This seems like a strange quirk/oversight.

The List data is all stored sequentially between the data of the previous and next column, correct? And since the number of fields in the list is fixed (or can be), shouldn't an array accessory (logically, implementation always being the hard part) come for free?


select userList[1] from Accounts

seems trivial to support because on-disk you already have the list elements/rows separated by special bit sequences, or the start of each element is a bit-level address stored somewhere else?

Given my use case is usually a single user or family of 2 or 4 living on one account, doing the search on CONTAINS KEY is cheap, but it still seems clunky.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ patrickjp93 commented ·

The data is stored as a blob so the entire collection needs to be retrieved in order to access the individual elements. There's a serialisation/deserialisation that takes place to access the contents of collections so it's not free.

For example to add an element at a particular position, Cassandra has to read the entire collection in order to rewrite the part of the list which needs to be shifted to new positions.

It's strange to me that you are retrieving a user at a specific index in the list. This indicates that you already know which position the user is in. If you already know the user, why do you have to retrieve it? This doesn't make sense.

As a side note, the elements in a list are not unique (you can have duplicates) and are stored in the order they were inserted. It seems like you require uniqueness so you should use a set collection instead. Cheers!

0 Likes 0 ·