question

Gangadhara M.B avatar image
Gangadhara M.B asked Erick Ramirez commented

What is the maximum size of a table in Cassandra?

What can be the maximum size of a table in Apache Cassandra 3.11.6 , one of our customer is having table around 1.5TB with 09 nodes cluster with each node with 01 TB data mount point . end/application users always complaining about read performances .

When i ask application team to split the large table into sub set of small chunk of a table like diving 1.5TB table into multiple smaller chunk of table with size like 100GB , they ask why cannot Cassandra handle large volume data with single table when reads are always with primary key(single column partition column)?

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There is no maximum limit for the size of a table in Cassandra since the contents of a cluster is distributed across the nodes in the ring. It is only limited by the storage size of each nodes multiplied by the number of nodes in the cluster.

Put simply, it is theoretically unlimited since you can just keep adding nodes to increase the storage capacity of the cluster. Cheers!

8 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Gangadhara M.B avatar image Gangadhara M.B commented ·

You mean to say add more nodes to existing 09 node cluster to have smaller chunk/volume of data on each node with same 01 TB data volume itself on each node .

In theory it is unlimited but when it comes to practical any thumb rule which says to have table size <= some size to get optimal performance ?

We have only few tables (03 table) having larger size like 01TB , 1.1 TB -----1.5 TB , otherwise rest of a tables in a cluster having smaller table .we have only one keyspace having less than 50 tables

Should we keep adding more nodes to existing 09 node cluster just to handle only few table performance ?


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·
In theory it is unlimited but when it comes to practical any thumb rule which says to have table size <= some size to get optimal performance ?

There is no "rule of thumb" for the table size. The size of the table has no bearing on the performance of a cluster. Factors other than the size of the table impact the performance of the cluster. Cheers!

0 Likes 0 ·
Gangadhara M.B avatar image Gangadhara M.B commented ·

Question is why to add more nodes just to handle only few large table size , problem is only with larger table size in terms of >1TB and <1.5TB , otherwise no issues with any read operation on any of smaller tables .

Larger tables have STS compaction set, so most of sstables are very very large (> 50 GB) looks this is also an problem , will consider to change compaction to LCS to see any benefits


0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·

I think you misunderstood my answer. I wasn't telling you to add nodes to deal with your specific scenario. The point that I made is that you can increase the storage capacity of the cluster by adding nodes. Cheers!

P.S. I've converted your posts to comments since they're not answers. :)

0 Likes 0 ·
Gangadhara M.B avatar image Gangadhara M.B commented ·

In our 09 node cluster none of nodes having space problem , all nodes having sufficient space , so we don't have storage capacity problem , the problem is with only few tables which are in terms of TB .

our cluster is on AWS EC2 , R5.2x large with SSD disk .

My question is does breaking only larger tables size in terms of TB into smaller chuck of more number of table can help to gain performance ?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·
In our 09 node cluster none of nodes having space problem , all nodes having sufficient space , so we don't have storage capacity problem

Again, I didn't say you had a capacity issue. I think you've misunderstood my point. :)

My question is does breaking only larger tables size in terms of TB into smaller chuck of more number of table can help to gain performance ?

No, it doesn't. As I said previously, the size of a table does not impact the cluster's performance. Cheers!

0 Likes 0 ·
Gangadhara M.B avatar image Gangadhara M.B commented ·


I am just trying to compare small and big table access comparing to fish in bucket and pond/ocean is it not the same analogy for small and big table access in Cassandra .


Is catching fish in bucket full of water and trying to catch same fish in pond or ocean not a different ?

in a bucket it's easy to catch because area/volume is small and trying to catch same fish in pond or ocean may takes longer or some time cannot catch it as well because of more volume ?



0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Gangadhara M.B commented ·

Not in Cassandra -- it takes the same amount of time to locate a partition in a small table as a large table, regardless of its size. This is because Cassandra doesn't have to scan the table to locate the partition. Instead, it uses a consistent hashing algorithm to determine which node the partition is stored. C* also uses in-memory data structures including a partition key cache and a partition index to quickly locate the SSTable where the partition is stored.

The size of a table really doesn't affect the performance of read requests in C*. Cheers!

0 Likes 0 ·