Are Dataframe's supported with joinWithCassandraTable? Is there are any way I can use DF to join with cassandra table in Open source version ?
Are Dataframe's supported with joinWithCassandraTable? Is there are any way I can use DF to join with cassandra table in Open source version ?
Yes they do! There is now an OSS Release which contains the DirectJoin implementation that is found in the DSE Release of the SCC. Note that this is an automatic catalyst optimization, basically because of how DataFrames work we cannot explicitly ask for a `joinWithCassandraTable` but if we present a plan which would benefit from such an optimization we can provide a strategy which will replace a generic join with the Cassandra specific one.
Adding Catalyst Rules
The 3.0-alpha release which is now on maven supports this as well as all other features that used to exist only in the DSE Version. We are working on more clear docs and tutorials but until they are ready please refer to our integration tests for examples.
Join Examples
@gpatcham_37382 The quick answer is that DataFrames do not support the RDD joinWithCassandraTable()
method. There is a Direct Join available from DataStax Enterprise 6.x but I don't know if it's available in OSS Spark. Details are in this blogpost.
I'll reach out internally to Russell Spitzer and get him to provide a response. Cheers!
5 People are following this question.
Issue with Cassandra-side pushdown in Spark connector
Is there a simple way to convert to Data Frame from RDD[CassandraRaw]?
How many partitions does Spark create reading a Cassandra table?
Spark application unable to connect to cassandra cluster
How can I minimise the impact of a full table scan running Spark against Cassandra?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2022 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use