Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Gangadhara M.B avatar image
Gangadhara M.B asked ·

What is the maximum size of a table in Cassandra?

What can be the maximum size of a table in Apache Cassandra 3.11.6 , one of our customer is having table around 1.5TB with 09 nodes cluster with each node with 01 TB data mount point . end/application users always complaining about read performances .

When i ask application team to split the large table into sub set of small chunk of a table like diving 1.5TB table into multiple smaller chunk of table with size like 100GB , they ask why cannot Cassandra handle large volume data with single table when reads are always with primary key(single column partition column)?

cassandra
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

There is no maximum limit for the size of a table in Cassandra since the contents of a cluster is distributed across the nodes in the ring. It is only limited by the storage size of each nodes multiplied by the number of nodes in the cluster.

Put simply, it is theoretically unlimited since you can just keep adding nodes to increase the storage capacity of the cluster. Cheers!

8 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

You mean to say add more nodes to existing 09 node cluster to have smaller chunk/volume of data on each node with same 01 TB data volume itself on each node .

In theory it is unlimited but when it comes to practical any thumb rule which says to have table size <= some size to get optimal performance ?

We have only few tables (03 table) having larger size like 01TB , 1.1 TB -----1.5 TB , otherwise rest of a tables in a cluster having smaller table .we have only one keyspace having less than 50 tables

Should we keep adding more nodes to existing 09 node cluster just to handle only few table performance ?


0 Likes 0 · ·
In theory it is unlimited but when it comes to practical any thumb rule which says to have table size <= some size to get optimal performance ?

There is no "rule of thumb" for the table size. The size of the table has no bearing on the performance of a cluster. Factors other than the size of the table impact the performance of the cluster. Cheers!

0 Likes 0 · ·

Question is why to add more nodes just to handle only few large table size , problem is only with larger table size in terms of >1TB and <1.5TB , otherwise no issues with any read operation on any of smaller tables .

Larger tables have STS compaction set, so most of sstables are very very large (> 50 GB) looks this is also an problem , will consider to change compaction to LCS to see any benefits


0 Likes 0 · ·

I think you misunderstood my answer. I wasn't telling you to add nodes to deal with your specific scenario. The point that I made is that you can increase the storage capacity of the cluster by adding nodes. Cheers!

P.S. I've converted your posts to comments since they're not answers. :)

0 Likes 0 · ·

In our 09 node cluster none of nodes having space problem , all nodes having sufficient space , so we don't have storage capacity problem , the problem is with only few tables which are in terms of TB .

our cluster is on AWS EC2 , R5.2x large with SSD disk .

My question is does breaking only larger tables size in terms of TB into smaller chuck of more number of table can help to gain performance ?

0 Likes 0 · ·
In our 09 node cluster none of nodes having space problem , all nodes having sufficient space , so we don't have storage capacity problem

Again, I didn't say you had a capacity issue. I think you've misunderstood my point. :)

My question is does breaking only larger tables size in terms of TB into smaller chuck of more number of table can help to gain performance ?

No, it doesn't. As I said previously, the size of a table does not impact the cluster's performance. Cheers!

0 Likes 0 · ·


I am just trying to compare small and big table access comparing to fish in bucket and pond/ocean is it not the same analogy for small and big table access in Cassandra .


Is catching fish in bucket full of water and trying to catch same fish in pond or ocean not a different ?

in a bucket it's easy to catch because area/volume is small and trying to catch same fish in pond or ocean may takes longer or some time cannot catch it as well because of more volume ?



0 Likes 0 · ·

Not in Cassandra -- it takes the same amount of time to locate a partition in a small table as a large table, regardless of its size. This is because Cassandra doesn't have to scan the table to locate the partition. Instead, it uses a consistent hashing algorithm to determine which node the partition is stored. C* also uses in-memory data structures including a partition key cache and a partition index to quickly locate the SSTable where the partition is stored.

The size of a table really doesn't affect the performance of read requests in C*. Cheers!

0 Likes 0 · ·