Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

liuhl6 avatar image
liuhl6 asked ·

Is 2TB the maximum capacity for each DSE 6.7 node?

https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html#Minimumdiskspacerecommendations

Is the content of the above linked webpage saying that each node maintains a maximum of 2TB of data? (depending on the compression strategy, does 2TB data require a maximum of 4TB hard drive space?) If so, does it mean that my database server does not need to be equipped with hard drives exceeding 4TB?

If I have a 40-core CPU, 128GB memory and 12x1.8TB hard drives, can I build 3 nodes on this physical machine? Each node has 1 disk for commit log, 2 disks for data? Do I need to configure the limits of available CPU and memory resources for each node? Is it configured in cassandra-env.sh?

There are many questions, thank you very much who can answer me, thank you!

density
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

To reinforce Bettina's answer, the recommendation in the DSE Capacity planning document is that you should not exceed the 2 TB limit because of the known performance and operational issues that go with very dense nodes.

To answer your questions directly:

Does it mean my server does not need to be equipped with drives exceeding 4TB?

This is correct. It makes no sense to have lots of disks that you will not use.

"Can I build 3 nodes on this physical machine?"

You can use the DSE Multi-Instance feature which allows you to deploy multiple DSE nodes on a single physical server server.

Note however that there are several caveats you need to be aware of:

  • You must configure the nodes hosted on a physical server so that they belong to the same Cassandra logical rack. This is needed to ensure replicas are not placed on nodes which are hosted on the same physical server. See the documentation for details.
  • The Multi-Instance feature is deprecated in DSE 6.8. As such, we don't recommend deploying new clusters using this feature.

Will there be performance problems in data insertion, query, and deletion in addition to essential maintenance operations?

Yes, there are. The recommendation for Cassandra is to scale horizontally (add more nodes) instead of scale vertically (bigger machines).

A cluster of 12 nodes with 16 cores + 64GB RAM will have better performance than a cluster of 6 nodes with 32 cores + 128GB RAM. This is true because there are more nodes to service the request -- Cassandra scales linearly. Cheers!

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you, @Erick Ramirez! It's very helpful.

0 Likes 0 · ·

Hi, @Erick Ramirez. Can the OS use the same disk that is used by ‘data’ or ‘commitlog’?

If I need RAID 1 for OS disk?

0 Likes 0 · ·

You can use the same disk but it is not recommended. This isn't specific to Cassandra. Best industry practice is to separate the operating system from your data so disk IO does not affect the other. Cheers!

0 Likes 0 · ·
bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @liuhl6,

even if the capacity planning document mentions that you can have up to 2TB of data with DSE 6.7.x, I would still recommend to stay within 500GB - 1TB per node. The more data you hold on your node, essential maintenance operations such as repairs, restores, bootstrapping, decommission take longer and longer, and you end up being hit harder by node failures.

See also this post for more details.

Regarding your deployment question: I am not sure I understand you correctly, but it seems that you want to install a 3-node cluster onto the same physical machine? What is your aim?

Cassandra is a distributed database, designed to be always available during node failures. Installing all onto the same server re-introduces a single point of failure. All three nodes would be down during a server failure or maintenance restart.

In production, ideally, you want to deploy each node onto its own dedicated physical machine, with its own host name and IP address, two dedicated SSD drives (one for commit log, one for data), at least 32GB of memory per node and 16 cores.

I hope this helps!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @bettina.swynnerton

I have three physical machines like that. Because each machine has too many disks, I am thinking about configuring more nodes on each machine to use those disks. Is this necessary? If the storage of each node exceeds the limit, will there be performance problems in data insertion, query, and deletion in addition to essential maintenance operations?

0 Likes 0 · ·