Hello!
Is there a joinWithCassandraTable() method in the free open source version for pyspark?
Or is it implemented only in the DSE version?
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
Hello!
Is there a joinWithCassandraTable() method in the free open source version for pyspark?
Or is it implemented only in the DSE version?
Hi! SCC 2.5.x and newer has Direct Join support (joinWithCassandraTable but for dataframes). I don't have python samples but here is a good scala article about Direct Join http://www.russellspitzer.com/2018/05/23/DSEDirectJoin/.
Are you sure this feature is implemented in the open source version of the connector for pyspark?
I am using spark-cassandra-connector_2.12 and Spark 3.0.1 but I can't find the directJoin method.
A simple join of two dataframes (one very small, the other very large) results to full table scan.
What am I doing wrong?
Thanks for the help, I figured out what the problem was.
In order for the direct join to work, it was necessary to set the following settings when creating the Spark Context:
spark = SparkSession.builder.\ config('directJoinSetting', 'on').\ config("spark.sql.extensions", "com.datastax.spark.connector.CassandraSparkExtensions").\ appName('directJoin').getOrCreate()
7 People are following this question.
Will spark-cassandra-connector 2.5.x be updated to include Cassandra Java driver 4.11?
Is Spark Cassandra Connector 2.5 compatible with DSE 5.1.18?
Why are we seeing repeated metadata refresh with the Spark connector v3?
Inserts slower after upgrading to 3.0.0-beta Spark connector
Will spark-cassandra-connector 2.5.x be updated to include Cassandra Java driver 4.10?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2021 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use