Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

liuhl6 avatar image
liuhl6 asked ·

Is my cluster's rack configuration causing a node to be overloaded?

[FOLLOW UP QUESTION TO #8512]

My DSE version is 6.7.5. Overload problems usually occur when performing continuous write operations that last a long time. I guess the problem may be my “rack” settings are wrong.

My DSE cluster contains one data center, the data center contains four nodes, three normal nodes, and one search node. The snitch is GossipingPropertyFileSnitch. There are three nodes in rack1 and only one node in rack2. Is this the cause of rack2 node overload? The occasional error node is 137.1.

If I put 137.2 in rack3, will it improve the overload problem?

If I do that. Do I need decommission the node first? Or just change the rack and restart?

topology
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

As a general rule when you want to place nodes in Cassandra racks, you should have the same number of racks as the replication factor. This is to make sure that data distribution is balanced. Otherwise, a single-rack configuration is sufficient for most environments.

The recommendation in production environments is to have 3 replicas in each DC so you need to have 3 racks. If this is not possible, you should revert your configuration to just one rack for the whole DC.

To do this, you will need to:

  1. Decommission the node.
  2. Completely delete all the contents (including subdirectories) of data/, commitlog/ and saved_caches/.
  3. Update the node's rack configuration.
  4. Add the node back to the cluster.

As a side note, this statement is incorrect:

Overload problems usually occur when performing continuous write operations that last a long time.

If the commitlog/ disk can sustain 15K IOPS, it doesn't matter if you are continuously writing to it for hours, days or months as long as you're writing at or below 15K IOPS. But if you go beyond 15K IOPS even for 1 second then you will see write failures. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you, @Erick Ramirez. I will correct the rack settings and try again.


0 Likes 0 · ·

Not a problem. Cheers!

0 Likes 0 · ·