question

ROBO avatar image
ROBO asked Erick Ramirez edited

Will read performance improve if we deploy Spark and Cassandra on the same machine?

Will Read performance improve if we deploy spark executor and Cassandra node in same machine.

what is your recommended setting,we are reading 10 million records on 24 cpu 16gb configuration on each node.Hence total 72 cores and 48 gb for 3 node cluster.it is talking 32 seconds and which is much higher.

Any suggetion?

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

Co-locating Spark and Cassandra on the same machines will definitely improve performance.

The Spark connector will establish connections on the same Cassandra node a task is running and the executors will request data on their local nodes minimising data shuffling between nodes.

Note that 16GB machines are too small for analytics workloads. We recommend a minimum of 16-24GB allocated to the Cassandra heap so you'll need more memory on the machines to allocate to the Spark workers/executors. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.