question

boris_187128 avatar image
boris_187128 asked Erick Ramirez edited

Trying to connect to AWS Cassandra with datastax.spark.connector without success

Hi Everyone, happy to be part of the community.

I'm trying to read Cassandra to Spark DF. here is the code: (using pyspark from AWS EMR for testing)


import boto3
s3 = boto3.client('s3', aws_access_key_id='$$$', aws_secret_access_key='$$$$')
s3.download_file('bucket','cassandra_truststore.jks','cassandra_truststore.jks')

spark = SparkSession.builder \
  .appName('SparkCassandraApp') \
  .config('spark.cassandra.connection.host', 'cassandra.us-east-1.amazonaws.com') \
  .config('spark.cassandra.connection.port', '9142') \
  .config('spark.cassandra.connection.ssl.enabled','true') \
  .config('spark.cassandra.connection.ssl.trustStore.path','cassandra_truststore.jks') \
  .config('spark.cassandra.connection.ssl.trustStore.password','amazon') \
  .config("spark.cassandra.auth.username","$$$$$")\
  .config("spark.cassandra.auth.password","$$$$$") \
  .getOrCreate()

df = sqlContext.read.format("org.apache.spark.sql.cassandra").options(table, keyspace).load()

The error is:

py4j.protocol.Py4JJavaError: An error occurred while calling o120.load.
: java.lang.IllegalArgumentException: Unsupported partitioner: local
at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forCassandraPartitioner(TokenFactory.scala:92)

How can I resolve this? stuck for 2 days.

Thanks a lot for the help

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

@boris_187128 I've tried to look for documentation on how the partitioner is configured in MCS but there doesn't seem to be public information about it. The error you posted indicates that MCS uses a partitioner called local:

java.lang.IllegalArgumentException: Unsupported partitioner: local
    at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forCassandraPartitioner(TokenFactory.scala:92)

I can however confirm that the only 2 partitioners supported by the spark-cassandra-connector (from TokenFactory.scala) are:

  • Murmur3TokenFactory
  • RandomPartitionerTokenFactory

For more information, see Cassandra Partitioners. This means that MCS isn't supported by the connector at this stage. I've logged SPARKC-587 on your behalf.

In the meantime, I recommend you try DataStax Astra -- a cloud-native service built with the best distribution of Apache Cassandra. You can try it for FREE. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

boris_187128 avatar image boris_187128 commented ·

Hi! Yes, trying to connect to MCS. Thank so much for the help.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ boris_187128 commented ·

@boris_187128 Welcome and thanks for being a part of the DataStax Community. A friendly note I've converted your post into a comment since it is not an "answer". Cheers!

0 Likes 0 ·
boris_187128 avatar image boris_187128 commented ·

Got it! thanks! will us DynamoDB for now.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ boris_187128 commented ·

Not a problem. Good luck!

0 Likes 0 ·