DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


premuditha_185831 avatar image
premuditha_185831 asked ·

How can I use the spark-cassandra-connector to connect to multiple Cassandra datacenters?

I'm using GCP and I have one datacenter in "asia-south1" and another one in "us-west1". I have provided a list of IP address of my nodes under as follows;,,

When I try to run a Spark job from PySpark Shell, I'm getting the following exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark/python/pyspark/sql/", line 172, in load
    return self._df(self._jreader.load())
  File "/opt/spark/python/lib/", line 1257, in __call__
  File "/opt/spark/python/pyspark/sql/", line 63, in deco
    return f(*a, **kw)
  File "/opt/spark/python/lib/", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o44.load.
: Failed to open native connection to Cassandra at {,,}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:168)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(TokenFactory.scala:98)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:272)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:56)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at py4j.reflection.MethodInvoker.invoke(
at py4j.reflection.ReflectionEngine.invoke(
at py4j.Gateway.invoke(
at py4j.commands.AbstractCommand.invokeMethod(
at py4j.commands.CallCommand.execute(
Caused by: java.lang.IllegalArgumentException: requirement failed: Contact points contain multiple data centers: us-west1, asia-south1
at scala.Predef$.require(Predef.scala:224)
at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$.determineDataCenter(LocalNodeFirstLoadBalancingPolicy.scala:145)
at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$$anonfun$init$1.apply(LocalNodeFirstLoadBalancingPolicy.scala:39)
at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$$anonfun$init$1.apply(LocalNodeFirstLoadBalancingPolicy.scala:39)
at scala.Option.getOrElse(Option.scala:121)
at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy.init(LocalNodeFirstLoadBalancingPolicy.scala:39)
at com.datastax.driver.core.Cluster$Manager.init(
at com.datastax.driver.core.Cluster.getMetadata(
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:161)
... 25 more

As a workaround, I removed the node in "asia-south1" from the `` list and then it works without no issue. I'm just wondering how to handle this issue.

10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@premuditha_185831 You cannot connect to multiple data centres by default. This is by design and is enforced to isolate analytics workload in one DC so as not to affect the performance of real-time workloads in other DCs.

For more details, see the Connecting to Cassandra page of the spark-cassandra-connector documentation on GitHub. Cheers!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez thank you for the quick response. I did actually read about this the whole night and figured this :)

But there's one thing I still don't understand. If I can use only one data center with spark-cassandra-connector, that simply means the Spark workers in other nodes (nodes in other data center) are useless? should I remove Spark Workers from the nodes in the other data center?

1 Like 1 · ·

@premuditha_185831 That's correct since it cannot span DCs. Cheers!

0 Likes 0 · ·
premuditha_185831 avatar image premuditha_185831 Erick Ramirez ♦♦ ·

@Erick Ramirez perfect! thanks for your time :)

0 Likes 0 · ·
Show more comments