Hi Everyone, happy to be part of the community.
I'm trying to read Cassandra to Spark DF. here is the code: (using pyspark from AWS EMR for testing)
import boto3 s3 = boto3.client('s3', aws_access_key_id='$$$', aws_secret_access_key='$$$$') s3.download_file('bucket','cassandra_truststore.jks','cassandra_truststore.jks') spark = SparkSession.builder \ .appName('SparkCassandraApp') \ .config('spark.cassandra.connection.host', 'cassandra.us-east-1.amazonaws.com') \ .config('spark.cassandra.connection.port', '9142') \ .config('spark.cassandra.connection.ssl.enabled','true') \ .config('spark.cassandra.connection.ssl.trustStore.path','cassandra_truststore.jks') \ .config('spark.cassandra.connection.ssl.trustStore.password','amazon') \ .config("spark.cassandra.auth.username","$$$$$")\ .config("spark.cassandra.auth.password","$$$$$") \ .getOrCreate() df = sqlContext.read.format("org.apache.spark.sql.cassandra").options(table, keyspace).load()
The error is:
py4j.protocol.Py4JJavaError: An error occurred while calling o120.load. : java.lang.IllegalArgumentException: Unsupported partitioner: local at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forCassandraPartitioner(TokenFactory.scala:92)
How can I resolve this? stuck for 2 days.
Thanks a lot for the help