I was wondering if it's possible to have the Spark-Worker run on a Cassandra node and have the worker requesting only the token range which the node is responsible of? thereby ensuring data locality
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
@rzilkha_129571 There's a little bit of nuance in that DSE Spark executors (which do the actual processing work) use native integration when accessing data and is locality-aware as a result of the integration between Spark and Cassandra in DSE Analytics. But if there isn't sufficient Spark partitions/parallelism in your application, the locality may not be as significant.
This may not necessarily apply if you're using open-source Apache Spark + OSS connector. Cheers!
2 People are following this question.