Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mohmmad.m.aburadeh_143075 avatar image
mohmmad.m.aburadeh_143075 asked Erick Ramirez edited

What are the effects of having too many keyspaces?

Hi

We have a development Cassandra cluster of 8 nodes. It's being used by more than 20 developers and every day we create many keyspaces and more than 10 tables in each keyspace.

Currently, the cluster has more than 300 Keyspaces. The performance of the cluster is really bad and I do not find anything in Cassandra logs that show what's wrong in the cluster. Sometimes we got failures when creating new keyspaces.

Do you think this is something because of having too many keyspaces? and why?

Thanks in advance.

schema
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

starlord avatar image
starlord answered

Allowing too many keyspaces/tables to exist in a cluster can lead to poor performance because every table will utilize some amount of heap space for tracking, so depending on your Cassandra heap size, you could reach a point where the heap pressure negatively affects performance. You typically see worse gc performance and more frequent gc as a result as well, so if you believe this to be the case, my advice would be to try and maintain less than 200 total tables per cluster and to spin up a new cluster if you need to go well over that number. Having a large enough heap is also important, so try to size the Cassandra heap accordingly.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

Our general recommendation is to have 200 tables in total regardless of how many keyspaces there are. 300 keyspaces with ~20 tables each (6,000 in total) is way too much.

Whenever an app instance connects to the cluster, the driver retrieves metadata from the cluster which includes the schema. Retrieving the schema of 200 tables is going to be much quicker than retrieving the schema of ~6,000 tables.

Every time the schema is updated with a new keyspace or table, each app instance needs to refresh its version of the schema so this will have a significant hit on the cluster's performance. There's a good chance that some of the app instances would even timeout while performing this.

You need to consider deploying multiple small clusters and only giving access to a small number of developers. You will also need to manage the schema somehow and make sure that any keyspaces/tables no longer required are dropped. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks a lot, @Erick Ramirez !
I'm not sure how the large companies use Cassandra where there are thousands of tables!
For example, In our product that was built over Oracle DBMS, there are thousands of tables in the Oracle schema. We are moving to Big Data by moving the tables/data to Cassandra, we have moved more than 100 tables to Cassandra. If there is such a limitation on the number of tables can be in Cassandra(200), then what can we do with the thousands of tables?
What are your recommendations when we need to create more than 200 tables in a single keyspace?

I understand it's bad performance-wise but It's a little confusing!

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ mohmmad.m.aburadeh_143075 ·

RDBMS vs NoSQL isn't a fair comparison -- Cassandra is highly-distributed, highly-available and highly-performant which you are trading off for some conveniences but you don't have these attributes in Oracle.

Thousands of tables sounds like you have a multi-tenant use case and you should consider deploying multiple small clusters for each tenant. Cheers!

0 Likes 0 ·