Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

asanka_185837 avatar image
asanka_185837 asked ·

java.io.IOException: when trying to load a table from Cassandra using spark

Hi,

I'm trying to connect Cassandra 3 nodes cluster with the spark. Right now i have one master spark server and 3 workers installed in each Cassandra node.
Cassandra 3.11.6
Spark 2.4.5
datastax:spark-cassandra-connector:2.4.0-s_2.11


Code that I am trying to run in pyspark shell:

sql = SQLContext(spark)
def load_table(sql_context, keyspace, table):
    return sql_context.read.format("org.apache.spark.sql.cassandra").options(table=table, keyspace=keyspace).load()
group = load_table(sql,'keyspace', 'table')


When i try to load a table it gives an error saying :

java.io.IOException: Failed to open native connection to Cassandra at {10.240.0.50, 10.240.0.51, 10.240.0.52}:904
2
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector
$$createSession(CassandraConnector.scala:168)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
        at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
        at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(TokenFactory.sc
ala:98)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:272)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:56)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)


Any ideas?

cassandrapyspark
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@asanka_185837 The error you posted is very generic so it's hard to know what caused the problem. Can you confirm that the native client port 9042 is bound to the nodes' IP address that you listed? For example if you check with netstat or lsof, does it show 10.240.0.50:9042?

Another thing to check is if you could connect to the nodes via cqlsh. For example:

$ cqlsh 10.240.0.51

This will at least confirm whether you're connecting to the Cassandra nodes correctly or not from Spark. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered ·
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Aleks Volochnev avatar image
Aleks Volochnev answered ·

Agree with the previous answers, looks like a network issue.

  • Make sure your spark node can reach the cassandra node
  • Check if the port is open
  • Check if Cassandra listens to that interface on that port
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.