question

Valuser avatar image
Valuser asked Valuser edited

How can we analyse storage requirements before loading data?

Hi,

Lets say i have some data to write which includes blob data, and total size may be in order of GBs. If i start to write the data to our cassandra instance, how can i make sure that our instance will have the required space to make it a successful completion of writing all data. I am looking for a way to figure out whether we have sufficient disk space before i start to write some data (it can be large data). I dont want the writes to be stopped in the middle way of writes when the disk space is full. I want to know this before writes begin to that instance . Is this feature already there in cassandra ? if not what will be the way of achieving this either through cassandra driver or any tools?

storage
2 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image smadhavan ♦ commented ·

@Valuser, please update your post with the version of tools that you're using for this. Cassandra version, application program version, etc.? Thank you!

0 Likes 0 ·
Valuser avatar image Valuser smadhavan ♦ commented ·

I am using apache cassandra 3.11. DSE 5.1

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Valuser edited

There isn't such a feature. You will need to manually calculate the disk size based on trial-and-error.

For example, I recommend you try and load a subset of the data into a non-production cluster and extrapolate your requirements from there. I also suggest using a randomised sample instead of just picking the first N records to ensure it is representative of the whole population. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Valuser avatar image Valuser commented ·

Ok. Lets say i got data size before writes, what other factors should i consider for this disk size estimation ?

Also if we are having writes to 2 databases on the same instance at the same time, how can we estimate size for a single database before write? I mean before these writes,the disk space may be sufficient in the prespective of a single database but upon this simultaneous writes to these two databases, chances are the disk may get full right?

@Erick Ramirez

0 Likes 0 ·
Valuser avatar image Valuser commented ·

Also, i should also consider commitlog too right as a space estimation factor? Will the size of a record in commitlog be same as in sstable right?

@Erick Ramirez

0 Likes 0 ·