Can spark-cassandra-connector ensure data locality in a Spark cluster which runs on Apache YARN cluster mode?
Can spark-cassandra-connector ensure data locality in a Spark cluster which runs on Apache YARN cluster mode?
@srlsooriyagoda_185665 Similar to question 2323 you asked a couple of weeks ago, data locality can only be achieved when the Spark task is running on the same servers as the Cassandra nodes. If the Spark cluster and the Cassandra cluster are two distinct (separate clusters) then there is no data that is local to Spark. Cheers!
@Erick Ramirez Assume that Spark is running on the same Cassandra cluster in this case. Yeah, I know that the connector can achieve data locality by rinning the tasks on the relevant data location. my question is can the connector achieve this even when Spark is running on YARN cluster mode?
Thanks.
FYI I've converted your post into a comment because it is not an "answer". Cheers!
@srlsooriyagoda_185665 The answer is still the same -- if the executors are running on the same server as the Cassandra node, data will be local. The mode in which the Spark clients/driver/master are running isn't really relevant because it is the executors doing the work, not the requesting driver/app. Cheers!
5 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use