Tri avatar image
Tri asked Erick Ramirez commented

Why are 2 nodes on same physical machine slower with cassandra-stress?

DS210, Exercise: Add and Remove a Node to/from the Cluster

One of the goal of this exercise is to run a cassandra-stress on a singe-node cluster. Then add a 2nd node, redo the same stress test and observe that the performance has improved.

I found the opposite, ie a degradation instead of improvement in IO performance. There might be a plausible explanation. Hope you can help to clarify.

My lab setup is different than that of the exercise (separate machines in cloud). In my case, everything (cassandra-stress and all the nodes) runs on the same physical machine (Ubuntu 20.04, 32GB, SSD). Each node is installed like in DS201:

  • dse-6.8.1-bin.tar.gz is uncompressed in ~/bin/DSE6/nodeX

  • configured by running script labwork/config_node X


I my case, the 2-nodes cluster has slightly lower performance than the single-node cluster. The results are not fluke, repeated runs gave similar trends. I suppose the lower performance of the 2nodes scenario is b/c the 2 nodes, running in the same machine, compete for some common resources (maybe disk IO, memory bandwdth ?)

QUESTION: I am not entirely sure how to interpret the results as why 2-nodes are slower than 1node. Because the machine still have execess CPU/RAM available even when 2 nodes were running. Can you please help for some explanation?

Below is the summary of cassandra-stress:

./node1/resources/cassandra/tools/bin/cassandra-stress \
  user profile=./DS210_Operations/labwork/TestProfileEdited.yaml \
  "ops(insert=1,get_user=3)" -node -port native=9041

#----single-node cluster
Op rate                   :   33,071 op/s  [get_user: 24,774 op/s, insert: 8,297 op/s]
Partition rate            :   32,975 pk/s  [get_user: 24,678 pk/s, insert: 8,297 pk/s]
Row rate                  :   32,975 row/s [get_user: 24,678 row/s, insert: 8,297 row/s]
Latency mean              : 12.170 ms [get_user: 12.3 ms, insert: 11.7 ms]
Latency median            :  9.486 ms [get_user: 9.6 ms, insert: 9.2 ms]

#---2-nodes cluster
Op rate                   :   30,747 op/s  [get_user: 23,063 op/s, insert: 7,684 op/s]
Partition rate            :   30,676 pk/s  [get_user: 22,992 pk/s, insert: 7,684 pk/s]
Row rate                  :   30,676 row/s [get_user: 22,992 row/s, insert: 7,684 row/s]
Latency mean              : 29.405 ms [get_user: 30.0 ms, insert: 27.8 ms]
Latency median            : 23.446 ms [get_user: 24.1 ms, insert: 21.6 ms]
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

This is a good question. A lot of users fall into this trap so this gives me the opportunity to explain.

In Cassandra, one of its best attribute is there is no single point-of-failure because of the nothing-shared architecture. For this reason, SAN is an anti-pattern since an outage to the storage layer means that all nodes would be down.

In your case, you have 2 nodes sharing the same infrastructure and are competing for the same resources. But why is it slower? In a single-node configuration, there is only 1 process accepting requests and that C* instance reads and writes to the same disk on its own.

When you have 2 nodes sharing the same disk, they now have to compete for the same IO queue and the single Linux IO scheduler for access to the disk. This competition increases the coordination from a kernel perspective. It also effectively doubles the disk reads and writes but the IO bandwidth of the disk is the same.

Previously, you would have had 1 node accessing the disk with both reads and writes competing with each other for IO. When you have 2 nodes, you have 2 reads and 2 writes (4 consumers) competing with each other for IO.

Even for a single-node configuration, we recommend separating the data and commitlog directories to their own disks for better performance. The only exception to this is when using NVMe SSDs because these disks perform exceptionally well and are hard to saturate.

Hopefully this answers your question. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Tri avatar image Tri commented ·

Thank you very much for the technical details. So basically Cassandra is disk IO bound, before CPU and RAM.

Indeed my disk is SATA SSD. Still surprising that the disk I/O bandwidth could be saturated by load testing a single node.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ Tri commented ·

That's really a function of cassandra-stress -- it tries to find the breaking point of your cluster, i.e. stress it to the max. Cheers!

0 Likes 0 ·
Tri avatar image Tri Erick Ramirez ♦♦ commented ·

Yes, indeed, cassandra-stress repeats bacth runs with increased number of threads until there is no more improvement on IO performance. The final cassandra-stress results given in my initial post was stopped at 916 threads.

In practice, how can we determine what is the limiting IO factor (disk, CPU, RAM) ?

0 Likes 0 ·
Show more comments