question

oded.erner_188979 avatar image
oded.erner_188979 asked Erick Ramirez commented

How can a single node handle 2-4 TB of data?

it was mentioned in the tutorial that a single Cassandra node may handle up to 2-4Tb of data.
question is: How can a single node with a limited cache may handle so much data?

will it not abuse finally IO as to refresh the cache with relevant data?
or that most of the 2-4Tb of data is replicated data?

what is the idea/mechanism behind this?

Thanks

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

There is a qualifier when we talk about very dense nodes. In general terms, dense nodes are those which have more than 1 TB of data.

Our general recommendation for high throughput clusters is to target density to around 500GB per node. If performance is an important requirement then plan to add more nodes to the cluster as the data size on each node grows beyond 500GB and closer to 1TB.

It is fine to have dense nodes if most of the data is "cold" and only a small portion of the data is "hot" meaning most of the data is no longer accessed or updated and only the most recent data is actively accessed or updated.

In the case where most of the data is not accessed ("cold" data), it will not have an impact to things like the row cache (disabled by default) or the partition key cache. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

oded.erner_188979 avatar image oded.erner_188979 commented ·

Thanks for clear answer Eric!

1 Like 1 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ oded.erner_188979 commented ·

Not a problem. Cheers!

0 Likes 0 ·
Cedrick Lunven avatar image
Cedrick Lunven answered oded.erner_188979 commented

Hi,

Can you have a look to the READ PATH in Cassandra and amend your questions.

https://academy.datastax.com/#/online-courses/6167eee3-0575-4d88-9f80-f2270587ce23


Which cache are you talking about ? Memtables ? keycache ? There is nothing called Cache in Cassandra nodes.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

oded.erner_188979 avatar image oded.erner_188979 commented ·

Hi,

(well, I couldn't access the link. I ask to reset my password and it will not reach my email.

i checked the spam folder as well. maybe it takes lots of time.. I can reach the academy from previous links).

In the question I didn't refer to a specific cache.
(I can see there is the memtable, keycache, partition summary)

If we are dealing with a single node that has 2Tb.
It has a very high read requests (many thousands per second).

Read requests are spread all over the 2Tb data. no hot spot.


In this case the search for the data on disk (SSTables and so on) will not overflow IO?

It will not hit IO boundery?

What I'm trying yo understand is if a cassandra single node can support this case of 2Tb data and very high read requests. ?

Or that for that case it will take some other solution (as partitioning/sharding the data)? better?

Thanks much, Oded.

0 Likes 0 ·