Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

srlsooriyagoda_185665 avatar image
srlsooriyagoda_185665 asked ·

Is spark-cassandra-connector locality-aware if Spark and Cassandra are in different Docker containers?

spark-cassandra-connector can achieve data locality by running the tasks in the same node that the actual data is. How this works is if the Cassandra and spark are different services in a Swarm cluster? Spark containers and Cassandra containers have different IP addresses in this case.

cassandrasparkdockerpysparkswarm
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@srlsooriyagoda_185665 Data locality is achieved by establishing connections to Cassandra on the same nodes that the Spark tasks are running on (where possible).

In your case where Spark is not running on the same containers as Cassandra, there is no data that is local to Spark so it doesn't apply. The only guarantee you will get is that connections will be established in the same Cassandra DC as the spark.cassandra.connection.host. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.