If consistency is LOCAL_QUORUM and RF=3, does Spark connect to two nodes in the cluster?


when consistency level is set to LOCAL_QUORUM with rf = 3 , in this case does spark connect two nodes to the cassandra cluster or every client request is only served by coordinator , hence coordinator reads data from one of replica and that data is then forwarded to spark by coordinator?

1 Answer

The Spark connector uses the Java driver under the hood. When a read is requested by the connector, the driver connects to a node in the cluster which acts as the coordinator for the request.

When the consistency is set to LOCAL_QUORUM and the keyspace has a replication factor of 3 for the local DC, the coordinator contacts 2 replicas (quorum of 3) to request the data. Provided the responses are consistent between the 2 replicas, the result is returned to the Java driver and eventually the connector as the client.

Note that I have provided a simplified version of events in my answer for brevity. For more information, see How read requests are accomplished in Cassandra. Cheers!

I am concerned with :

1. does spark connect to co-ordinator only for read or co-ordinator route spark to connect with replica , I am asking this question as I have concern how number of requests served by cassandra cluster ?

The connector itself doesn't connect to the cluster. As I stated previously, it uses the Java driver to connect to the cluster. Think of the connector as just another client just like cqlsh or any other application. Cheers!

P.S. cqlsh is a standalone client/app which uses another driver (Python) to connect to a Cassandra cluster.

