Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

rzilkha_129571 avatar image
rzilkha_129571 asked ·

Spark Connector - how to ensure data locality?

Hi,

I was wondering if it's possible to have the Spark-Worker run on a Cassandra node and have the worker requesting only the token range which the node is responsible of? thereby ensuring data locality

spark-connectorbatch
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@rzilkha_129571 There's a little bit of nuance in that DSE Spark executors (which do the actual processing work) use native integration when accessing data and is locality-aware as a result of the integration between Spark and Cassandra in DSE Analytics. But if there isn't sufficient Spark partitions/parallelism in your application, the locality may not be as significant.

This may not necessarily apply if you're using open-source Apache Spark + OSS connector. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.