Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

srlsooriyagoda_185665 avatar image
srlsooriyagoda_185665 asked ·

Can the connector ensure data locality in a Spark cluster running on YARN?

Can spark-cassandra-connector ensure data locality in a Spark cluster which runs on Apache YARN cluster mode?

sparkconnectordata-localityyarn
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@srlsooriyagoda_185665 Similar to question 2323 you asked a couple of weeks ago, data locality can only be achieved when the Spark task is running on the same servers as the Cassandra nodes. If the Spark cluster and the Cassandra cluster are two distinct (separate clusters) then there is no data that is local to Spark. Cheers!

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez Assume that Spark is running on the same Cassandra cluster in this case. Yeah, I know that the connector can achieve data locality by rinning the tasks on the relevant data location. my question is can the connector achieve this even when Spark is running on YARN cluster mode?

Thanks.

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ srlsooriyagoda_185665 ·

FYI I've converted your post into a comment because it is not an "answer". Cheers!

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ srlsooriyagoda_185665 ·

@srlsooriyagoda_185665 The answer is still the same -- if the executors are running on the same server as the Cassandra node, data will be local. The mode in which the Spark clients/driver/master are running isn't really relevant because it is the executors doing the work, not the requesting driver/app. Cheers!

0 Likes 0 · ·