Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked ·

Does the read coordinator respond to client or does it connect client to all responding nodes?

I am reading data from cassandra using spark . I am wondering What happens when there is read request from spark to cassandra ? I have below query :

When there is read request by spark is it only co-ordinator that serves the read request after collecting data from all cassandra nodes where data resides or coordinator helps spark connecting cassandra nodes to read the data ?

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

As the name suggests, the coordinator node coordinates the request between the client and replicas. The coordinator is responsible for requesting the data from the replicas and returning the response to the client.

As Jaroslaw stated, if the Spark worker/executor is running on the same server as the Cassandra JVM instance then the read request is sent to the local node. If the Spark workers/executors are running on separate servers to the Cassandra nodes then there is no data-locality and the coordinator could be any node in the cluster. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

this means coordinator is responsible for forwarding the data , so If I am reading the data that is greater than size of the memory of coordinator , what would be happen in this case ?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ mishra.anurag643_153409 ·

The result set is serialised on heap in chunks and is buffered in segments until the whole result is sent back to the client.

If there are lots of reads taking place and the heap fills up, then the JVM will throw an out-of-memory error and you will need to restart the node to recover. Cheers!

0 Likes 0 ·
jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered ·

Hi, SCC reads with Consistency Level ONE. If Spark executors are colocated with Cassandra nodes then each executor queries only the local Cassandra node.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.