Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Valuser avatar image
Valuser asked Valuser edited

How can we analyse storage requirements before loading data?

Hi,

Lets say i have some data to write which includes blob data, and total size may be in order of GBs. If i start to write the data to our cassandra instance, how can i make sure that our instance will have the required space to make it a successful completion of writing all data. I am looking for a way to figure out whether we have sufficient disk space before i start to write some data (it can be large data). I dont want the writes to be stopped in the middle way of writes when the disk space is full. I want to know this before writes begin to that instance . Is this feature already there in cassandra ? if not what will be the way of achieving this either through cassandra driver or any tools?

storage
2 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Valuser, please update your post with the version of tools that you're using for this. Cassandra version, application program version, etc.? Thank you!

0 Likes 0 ·

I am using apache cassandra 3.11. DSE 5.1

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Valuser edited

There isn't such a feature. You will need to manually calculate the disk size based on trial-and-error.

For example, I recommend you try and load a subset of the data into a non-production cluster and extrapolate your requirements from there. I also suggest using a randomised sample instead of just picking the first N records to ensure it is representative of the whole population. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Ok. Lets say i got data size before writes, what other factors should i consider for this disk size estimation ?

Also if we are having writes to 2 databases on the same instance at the same time, how can we estimate size for a single database before write? I mean before these writes,the disk space may be sufficient in the prespective of a single database but upon this simultaneous writes to these two databases, chances are the disk may get full right?

@Erick Ramirez

0 Likes 0 ·

Also, i should also consider commitlog too right as a space estimation factor? Will the size of a record in commitlog be same as in sstable right?

@Erick Ramirez

0 Likes 0 ·