question

srlsooriyagoda_185665 avatar image
srlsooriyagoda_185665 asked Erick Ramirez answered

Is spark-cassandra-connector locality-aware if Spark and Cassandra are in different Docker containers?

spark-cassandra-connector can achieve data locality by running the tasks in the same node that the actual data is. How this works is if the Cassandra and spark are different services in a Swarm cluster? Spark containers and Cassandra containers have different IP addresses in this case.

cassandrasparkdockerpysparkswarm
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@srlsooriyagoda_185665 Data locality is achieved by establishing connections to Cassandra on the same nodes that the Spark tasks are running on (where possible).

In your case where Spark is not running on the same containers as Cassandra, there is no data that is local to Spark so it doesn't apply. The only guarantee you will get is that connections will be established in the same Cassandra DC as the spark.cassandra.connection.host. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.