randas avatar image
randas asked Erick Ramirez edited

repartitionByCassandraReplica returns blank RDD on Spark 3.0.2

I'm running spark 3.0.2, with spark-cassandra connector assembly 3.0.0 on spark - cassandra cluster as each node is a spark worker and have cassandra instant running on same machine.
when doing

val xRDD= X.repartitionByCassandraReplica(Keyspace,Table)

that xRDD result became blank.

does this still be supported in 3.0.2, or there is any reason could let the rdd after using the repartitionByCassandraReplica empty ?
is it really affect the performance if I called the "joinWithCassandraTable" without doing the repartitionByCassandraReplica ?
exception in the log :

Could not load org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal with loader null: java.lang.ClassNotFoundException
spark-cassandra-connectordata locality
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

repartitionByCassandraReplica() is definitely supported by the connector.

I don't know of any reason why it would result in an empty RDD. I am aware that if the Cassandra table doesn't have a lot of partitions, some of the resulting Spark partitions can be empty.

In the case when you don't have a lot of data, there is no advantage to using repartitionByCassandraReplica() and you might as well go straight to joinWithCassandraTable.

On a slightly different topic, I noted the ClassNotFoundException for the GraphTraversal class. Are you connecting to a DataStax Enterprise cluster or just open-source Cassandra?

As a general rule, ClassNotFoundException is almost always caused by a version mismatch in the components you in your app. For example, incompatible Scala version used with the connector. You need to validate that all the dependencies in your configuration are valid and compatible. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.