Build Cloud-Native apps with Apache Cassandra

GOT QUESTIONS from the Workshop? You're in the right place! Post a question here and we'll get you answers.

Click here for Week 8 Materials and Homework.

Follow us on Eventbrite to get notified when new workshops are scheduled!


question

oded.erner_188979 avatar image
oded.erner_188979 asked ·

How can a single node handle 2-4 TB of data?

it was mentioned in the tutorial that a single Cassandra node may handle up to 2-4Tb of data.
question is: How can a single node with a limited cache may handle so much data?

will it not abuse finally IO as to refresh the cache with relevant data?
or that most of the 2-4Tb of data is replicated data?

what is the idea/mechanism behind this?

Thanks

cassandra
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

There is a qualifier when we talk about very dense nodes. In general terms, dense nodes are those which have more than 1 TB of data.

Our general recommendation for high throughput clusters is to target density to around 500GB per node. If performance is an important requirement then plan to add more nodes to the cluster as the data size on each node grows beyond 500GB and closer to 1TB.

It is fine to have dense nodes if most of the data is "cold" and only a small portion of the data is "hot" meaning most of the data is no longer accessed or updated and only the most recent data is actively accessed or updated.

In the case where most of the data is not accessed ("cold" data), it will not have an impact to things like the row cache (disabled by default) or the partition key cache. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for clear answer Eric!

1 Like 1 · ·

Not a problem. Cheers!

0 Likes 0 · ·
Cedrick Lunven avatar image
Cedrick Lunven answered ·

Hi,

Can you have a look to the READ PATH in Cassandra and amend your questions.

https://academy.datastax.com/#/online-courses/6167eee3-0575-4d88-9f80-f2270587ce23


Which cache are you talking about ? Memtables ? keycache ? There is nothing called Cache in Cassandra nodes.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi,

(well, I couldn't access the link. I ask to reset my password and it will not reach my email.

i checked the spam folder as well. maybe it takes lots of time.. I can reach the academy from previous links).

In the question I didn't refer to a specific cache.
(I can see there is the memtable, keycache, partition summary)

If we are dealing with a single node that has 2Tb.
It has a very high read requests (many thousands per second).

Read requests are spread all over the 2Tb data. no hot spot.


In this case the search for the data on disk (SSTables and so on) will not overflow IO?

It will not hit IO boundery?

What I'm trying yo understand is if a cassandra single node can support this case of 2Tb data and very high read requests. ?

Or that for that case it will take some other solution (as partitioning/sharding the data)? better?

Thanks much, Oded.

0 Likes 0 · ·