Are Dataframe's supported with joinWithCassandraTable? Is there are any way I can use DF to join with cassandra table in Open source version ?
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
Yes they do! There is now an OSS Release which contains the DirectJoin implementation that is found in the DSE Release of the SCC. Note that this is an automatic catalyst optimization, basically because of how DataFrames work we cannot explicitly ask for a `joinWithCassandraTable` but if we present a plan which would benefit from such an optimization we can provide a strategy which will replace a generic join with the Cassandra specific one.
Adding Catalyst Rules
The 3.0-alpha release which is now on maven supports this as well as all other features that used to exist only in the DSE Version. We are working on more clear docs and tutorials but until they are ready please refer to our integration tests for examples.
@gpatcham_37382 The quick answer is that DataFrames do not support the RDD
joinWithCassandraTable() method. There is a Direct Join available from DataStax Enterprise 6.x but I don't know if it's available in OSS Spark. Details are in this blogpost.
I'll reach out internally to Russell Spitzer and get him to provide a response. Cheers!
5 People are following this question.