shaiwolf avatar image
shaiwolf asked jaroslaw.grabowski_50515 commented

Spark connector reads puts stress on 1 node


The attached file shows the range reads when reading a table from Cassandra. It shows that 1 Cassandra node is being requested many more reads than other nodes. It is always the same Cassandra node (VPC-CASSANDRA-005). When it's done, there are 0 reads from that node until the task ends.

All other nodes read requests are balanced. Is that a connector bug, or something external, e.g. configuration?


  • Amazon Linux AMI 2018.03
  • java version "1.8.0_131"
  • spark-core_2.12-3.1.1.jar
  • spark-cassandra-connector_2.12-3.1.0.jar
  • Cassandra 3.1.1 [cqlsh 5.0.1 | DSE 5.1.22 | CQL spec 3.4.4 | DSE protocol v1]
  • 36 Cassandra servers
  • Spark runs on a remote server - 12 instances - 5 cores each




image.png (75.3 KiB)
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ commented ·

That's interesting. Could you enable TRACE logging for CassandraTableScanRDD for your driver process? It would be interesting to see this log line:

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

By default, the Spark connector users a consistency level of LOCAL_ONE when reading from Cassandra. If your app doesn't partition/parallelize correctly, it would explain why all the work is getting sent to just one node.

We need a lot more background information including a minimal sample code which replicates the problem you're seeing so please log a ticket with DataStax Support so one of our engineers can assist you. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.