Hello!
Is there a joinWithCassandraTable() method in the free open source version for pyspark?
Or is it implemented only in the DSE version?
Hello!
Is there a joinWithCassandraTable() method in the free open source version for pyspark?
Or is it implemented only in the DSE version?
Hi! SCC 2.5.x and newer has Direct Join support (joinWithCassandraTable but for dataframes). I don't have python samples but here is a good scala article about Direct Join http://www.russellspitzer.com/2018/05/23/DSEDirectJoin/.
Are you sure this feature is implemented in the open source version of the connector for pyspark?
I am using spark-cassandra-connector_2.12 and Spark 3.0.1 but I can't find the directJoin method.
A simple join of two dataframes (one very small, the other very large) results to full table scan.
What am I doing wrong?
Thanks for the help, I figured out what the problem was.
In order for the direct join to work, it was necessary to set the following settings when creating the Spark Context:
spark = SparkSession.builder.\ config('directJoinSetting', 'on').\ config("spark.sql.extensions", "com.datastax.spark.connector.CassandraSparkExtensions").\ appName('directJoin').getOrCreate()
7 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use