DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

anshita333saxena_187432 avatar image
anshita333saxena_187432 asked ·

How can we reduce cpu utilization on Scylla DB nodes when spark-cassandra-connector is accessing the cluster?

Going through the configuration documentation.
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

Trying to find out the way so that we can be able to access DB nodes with less cpu utilization. Can you please suggest the parameter name that needs to be used here to reduce the cpu utilization of the cluster?
Table size: 15GB

spark-cassandra-connectorspark-submit
4 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Russell Spitzer Any suggestions please.

0 Likes 0 · ·

I tried to put these parameters to check whether the cpu utilization is going down or not but all of these parameters are not able to reduce the cpu utilization of the nodes while cassandra-spark-connector is accessing.

Spark nodes: 64 GB Memory
DB nodes: 128 GB Memory
Parameters tried: spark.cassandra.input.split.size_in_mb, spark.cassandra.connection.connections_per_executor_max
Can you please help here to optimize the cpu utilization, currently it is utilizing 50% in all the nodes, while running spark jobs?

0 Likes 0 · ·

When I set up the minor spark cluster having 8 cores per node then I can see that CPU utilization of the db node dropped from 50% to 20%.
Minor Spark Cluster (2 nodes): 8 cores per node
Major Spark Cluster (2 nodes): 32 cores per node.
I was going through the spark-cassandra-connector properties, then I can see that we can use the parameter to reduce the number of cores: --total-executor-cores

However, in my Spark Web UI, it is still showing the same cores- 64 cores and hence the cpu-utilization is same 50% in major spark cluster.
Can you please suggest if I am missing something here?

0 Likes 0 · ·

Found the parameters to reduce the cpu-utilization of the cluster.

Parameters:

spark.cores.max

spark.executor.cores


Actually this parameter configuration tuning needs to be done from Spark Cluster core. Due to more number of cores, spark is passing more cpu-utilization load to the DB cluster.
url used: https://spark.apache.org/docs/latest/configuration.html
https://docs.datastax.com/en/dse/6.8/dse-dev/datastax_enterprise/spark/sparkCassandraProperties.html

0 Likes 0 · ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

Just letting you know that we do not test against ScyllaDB and we know users run into issues because it is a fork of Apache Cassandra and the internal implementations are different.

The folks at ScyllaDB have forked the Spark connector for this reason and they support their own implementation. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Sure Erick! Thanks for the reply.

0 Likes 0 · ·