Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



anshita333saxena_187432 avatar image
anshita333saxena_187432 asked Erick Ramirez edited

How can we reduce cpu utilization on Scylla DB nodes when spark-cassandra-connector is accessing the cluster?

Going through the configuration documentation.

Trying to find out the way so that we can be able to access DB nodes with less cpu utilization. Can you please suggest the parameter name that needs to be used here to reduce the cpu utilization of the cluster?
Table size: 15GB

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I tried to put these parameters to check whether the cpu utilization is going down or not but all of these parameters are not able to reduce the cpu utilization of the nodes while cassandra-spark-connector is accessing.

Spark nodes: 64 GB Memory
DB nodes: 128 GB Memory
Parameters tried: spark.cassandra.input.split.size_in_mb, spark.cassandra.connection.connections_per_executor_max
Can you please help here to optimize the cpu utilization, currently it is utilizing 50% in all the nodes, while running spark jobs?

0 Likes 0 ·

When I set up the minor spark cluster having 8 cores per node then I can see that CPU utilization of the db node dropped from 50% to 20%.
Minor Spark Cluster (2 nodes): 8 cores per node
Major Spark Cluster (2 nodes): 32 cores per node.
I was going through the spark-cassandra-connector properties, then I can see that we can use the parameter to reduce the number of cores: --total-executor-cores

However, in my Spark Web UI, it is still showing the same cores- 64 cores and hence the cpu-utilization is same 50% in major spark cluster.
Can you please suggest if I am missing something here?

0 Likes 0 ·

Found the parameters to reduce the cpu-utilization of the cluster.




Actually this parameter configuration tuning needs to be done from Spark Cluster core. Due to more number of cores, spark is passing more cpu-utilization load to the DB cluster.
url used:

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered anshita333saxena_187432 commented

Just letting you know that we do not test against ScyllaDB and we know users run into issues because it is a fork of Apache Cassandra and the internal implementations are different.

The folks at ScyllaDB have forked the Spark connector for this reason and they support their own implementation. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sure Erick! Thanks for the reply.

0 Likes 0 ·