question

jtdelato avatar image
jtdelato asked Erick Ramirez commented

What is the definition of hot partition in terms of frequency of reads/writes?

I understand the basic definition of a "hot partition" to be a partition that receives a lot of reads and writes, usually due to poor data modeling.

What is considered to be "a lot" in terms of how frequently that partition receives read or write requests? Would it need to be on the order of several transactions per second?

Also, what is a bigger performance problem: a read-heavy or write-heavy workload? I know Cassandra is generally more optimized for writes, but I read recently that 3.11 has better read performance than previous versions.

Thanks so much for any insight you can provide!

data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There's no hard rule for it but it's generally just a partition (or a subset of partitions) which are accessed more frequently than other partitions in the table.

We generally refer to hot partitions in the context of unbalanced loads in the cluster such that some nodes are accessed significantly more compared to other nodes.

Think of a situation where an application keyspace has a replication factor of 3 in a data centre which has 50 nodes. In a table of users where one of the users is a service account that is used to connect multiple times a day for monitoring.

The 3 replicas out of the 50 nodes which hold this service account will have a significantly higher utilisation compared to the rest of the nodes because it is a hot partition. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jtdelato avatar image jtdelato commented ·

Thank you Erick!

Just a follow-up: is there a general rule of thumb for when a hot partition, such as the one you described, would start to affect the performance of the cluster?

The load on the cluster would obviously be unbalanced, but if the service account you described only made a handful of requests a day, would that necessarily impact the cluster's performance?

Thanks again! I really appreciate all your help.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ jtdelato commented ·

No rule of thumb. The service account in my example is a problem because it is getting accessed a lot. If it weren't then it wouldn't be a hot partition. Cheers!

0 Likes 0 ·