question

dawnhawkbg_139017 avatar image
dawnhawkbg_139017 asked Cedrick Lunven answered

Cassandra LOCAL_QUORUM is waiting for remote datacenter responses

We have a 2 datacenters ( One in EU and one in US ) cluster with 4 nodes each deployed in AWS. The nodes are separated in 3 racks ( Availability zones ) each. In the cluster we have a keyspace test with replication: NetworkTopologyStrategy, eu-west:3, us-east:3 In the keyspace we have a table called mytable that has only one row 'id' text

Now, we were doing some tests on the performance of the database. In CQLSH with a consistency level of LOCAL_QUORUM we were doing some inserts with TRACING ON and we noticed that the requests were not working as we expected them.

From the tracing data we found out that the coordinator node was hitting as expected 2 other local nodes and was also sending a request to one of the remote datacenter nodes. Now the problem here was that the coordinator was waiting not only for the local nodes ( who finished in no time ) but for the remote nodes too.

Now since our 2 datacenters are geographically far away from each other, our requests were taking a very long time to complete.

Notes: - This does not happen with DSE but our understanding was we don't need to pay crazy money for LOCAL_QUORUM to work as is expected

cassandrareplicationquorum
2 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

dawnhawkbg_139017 avatar image dawnhawkbg_139017 commented ·

Our problem seemed to be related to the Authentication and the Authorization. For some reason with them we were seeing CRAZY slow requests and with them disabled the tracing shows everything as expected.

0 Likes 0 ·
ben.krug_85176 avatar image ben.krug_85176 dawnhawkbg_139017 commented ·

It sounds as if either your your authorization keyspace (eg, system_auth) has a misconfigured replication strategy, or you are using the 'cassandra' role for the queries. Using the 'cassandra' role uses QUORUM for auth, whereas other roles use LOCAL_ONE, iirc.

0 Likes 0 ·
Russell Spitzer avatar image
Russell Spitzer answered Russell Spitzer edited

My guess here would be that the datacenter specified in the replication factor for that key space is ill-defined.I will check the output of nodetool status to make sure that every node that is supposed to be in that data center is in it. Then I would make sure that the key space has the proper replication factor and network topology strategy.

Possible errors could be in the snitch configurations for each of the nodes or in the network topology config.

I'm basically hypothesizing that when the request goes out it actually believes that the other nodes are in the datacenter you are trying to query.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

ben.krug_85176 avatar image
ben.krug_85176 answered

If you use the 'cassandra' role and use authorization, it will use QUORUM to do the authentication. Other roles will use LOCAL_ONE, iirc. It sounds as if either your your authorization keyspace (eg, system_auth) has a misconfigured replication strategy, or you are using the 'cassandra' role for the queries.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered

Hey, will add reference to the documentation for commands to pass after the ALTER KEYSPACES : https://docs.datastax.com/en/security/6.7/security/secSystemKeyspace.html

About changing snitching for multiregion ec2 (eu, us)

https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archSnitchEC2MultiRegion.html

Seems you don't need "crazy money" after all, also shameless plug you can now have production support for OSS : https://luna.datastax.com


M2c.

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.