question

asanka_185837 avatar image
asanka_185837 asked Aleks Volochnev answered

java.io.IOException: when trying to load a table from Cassandra using spark

Hi,

I'm trying to connect Cassandra 3 nodes cluster with the spark. Right now i have one master spark server and 3 workers installed in each Cassandra node.
Cassandra 3.11.6
Spark 2.4.5
datastax:spark-cassandra-connector:2.4.0-s_2.11


Code that I am trying to run in pyspark shell:

sql = SQLContext(spark)
def load_table(sql_context, keyspace, table):
    return sql_context.read.format("org.apache.spark.sql.cassandra").options(table=table, keyspace=keyspace).load()
group = load_table(sql,'keyspace', 'table')


When i try to load a table it gives an error saying :

java.io.IOException: Failed to open native connection to Cassandra at {10.240.0.50, 10.240.0.51, 10.240.0.52}:904
2
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector
$$createSession(CassandraConnector.scala:168)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
        at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
        at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(TokenFactory.sc
ala:98)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:272)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:56)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)


Any ideas?

cassandrapyspark
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

@asanka_185837 The error you posted is very generic so it's hard to know what caused the problem. Can you confirm that the native client port 9042 is bound to the nodes' IP address that you listed? For example if you check with netstat or lsof, does it show 10.240.0.50:9042?

Another thing to check is if you could connect to the nodes via cqlsh. For example:

$ cqlsh 10.240.0.51

This will at least confirm whether you're connecting to the Cassandra nodes correctly or not from Spark. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered
Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Aleks Volochnev avatar image
Aleks Volochnev answered

Agree with the previous answers, looks like a network issue.

  • Make sure your spark node can reach the cassandra node
  • Check if the port is open
  • Check if Cassandra listens to that interface on that port
Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.