DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

srlsooriyagoda_185665 avatar image
srlsooriyagoda_185665 asked ·

Is spark-cassandra-connector locality-aware if Spark and Cassandra are in different Docker containers?

spark-cassandra-connector can achieve data locality by running the tasks in the same node that the actual data is. How this works is if the Cassandra and spark are different services in a Swarm cluster? Spark containers and Cassandra containers have different IP addresses in this case.

cassandrasparkdockerpysparkswarm
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@srlsooriyagoda_185665 Data locality is achieved by establishing connections to Cassandra on the same nodes that the Spark tasks are running on (where possible).

In your case where Spark is not running on the same containers as Cassandra, there is no data that is local to Spark so it doesn't apply. The only guarantee you will get is that connections will be established in the same Cassandra DC as the spark.cassandra.connection.host. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.