question

phnx16 avatar image
phnx16 asked Erick Ramirez commented

Is it a bad practice to have a single row partition in a Cassandra table?

Hello community,

let's say I have a table like this

CREATE TABLE request(
    transaction_id text,
    request_date timestamp,
    data text,
    PRIMARY KEY (transaction_id)
);

The transaction_id is unique, so as far as I understand each partition in this table would have one row only and I'm not sure if this situation causes a performance issue in the OS, maybe because Cassandra creates a file for each partition causing lots of files to manage for its hosting OS, as a note I'm not sure how Cassandra creates its files fot its tables.

In this scenario I can find a request by its transaction_id like

select data from request where transaction_id = 'abc';

If the previous assumption is correct, a different approach could be the next one?

CREATE TABLE request(
    the_date date,
    transaction_id text,
    request_date timestamp,
    data text,
    PRIMARY KEY ((the_date), transaction_id)
);

The field the_date would change every next day, so the partitions in each table would change for each day.

In this scenario I would have to have the_date data always available to the client so I can find a request using the next query

select data from request where the_date = '2020-09-23' and transaction_id = 'abc';

Thank you in advance for your kind help!

data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

A simple primary key is perfectly fine in Cassandra.

There is definitely no rule that dictates you must have a composite partition key or a compound primary key. If you need a refresher about these terms, I've previously explained them in #6171.

Let me respond to other points you raised.

#1 - This statement is incorrect:

... as far as I understand each partition in this table would have one row only and I'm not sure if this situation causes a performance issue in the OS ...

Single row partitions do not have any inherent performance issues. These are just regular partitions in Cassandra.

#2 - This is also incorrect:

Cassandra creates a file for each partition causing lots of files to manage for its hosting OS

SSTables can contain one or more partitions. They are not limited to just one partition.

#3 - The second table you posted is irrelevant. You need to design a table for each application query. You do not need to create artificial compound primary keys. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

phnx16 avatar image phnx16 commented ·

Thank you @Erick Ramirez for the clarifications provided, they really helped.

If I had a lot of "one row partitions" in many tables per node, could this may increase the size of the bloom filter? Does each partition has a separate entry in it? Is some kind of memory or bloom filter tunning needed?


In my particular case I just want to insert records temporaly (24hrs) in the "request" table, in order to make this records temporal I would use Cassandra native TTL when inserting each row.

After a few moments (less than 24 hrs) I will SELECT the table by transaction_id only once

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ phnx16 commented ·

It's not really going to be an issue. The Bloom filter on a node is about 1-2 GB per billion partitions. So even 100M partitions on one node will only be about 100-200MB. Cheers!

1 Like 1 ·
phnx16 avatar image phnx16 Erick Ramirez ♦♦ commented ·

Alright, thank you for your help Erick! I appreciate it

0 Likes 0 ·
Show more comments