asanka_185837 avatar image
asanka_185837 asked Aleks Volochnev answered when trying to load a table from Cassandra using spark


I'm trying to connect Cassandra 3 nodes cluster with the spark. Right now i have one master spark server and 3 workers installed in each Cassandra node.
Cassandra 3.11.6
Spark 2.4.5

Code that I am trying to run in pyspark shell:

sql = SQLContext(spark)
def load_table(sql_context, keyspace, table):
    return"org.apache.spark.sql.cassandra").options(table=table, keyspace=keyspace).load()
group = load_table(sql,'keyspace', 'table')

When i try to load a table it gives an error saying : Failed to open native connection to Cassandra at {,,}:904
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
        at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
        at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:272)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:56)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(
        at py4j.reflection.MethodInvoker.invoke(
        at py4j.reflection.ReflectionEngine.invoke(
        at py4j.Gateway.invoke(
        at py4j.commands.AbstractCommand.invokeMethod(
        at py4j.commands.CallCommand.execute(

Any ideas?

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

@asanka_185837 The error you posted is very generic so it's hard to know what caused the problem. Can you confirm that the native client port 9042 is bound to the nodes' IP address that you listed? For example if you check with netstat or lsof, does it show

Another thing to check is if you could connect to the nodes via cqlsh. For example:

$ cqlsh

This will at least confirm whether you're connecting to the Cassandra nodes correctly or not from Spark. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Aleks Volochnev avatar image
Aleks Volochnev answered

Agree with the previous answers, looks like a network issue.

  • Make sure your spark node can reach the cassandra node
  • Check if the port is open
  • Check if Cassandra listens to that interface on that port
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.