question

set1984 avatar image
set1984 asked set1984 edited

Why does failover not get triggered when there is only one node alive in the local DC?

I set up two data centers (DC1, DC2) with 3 nodes each. The local-datacenter is DC1. The micro-service doesn't failover to DC2 automatically when there is only one node alive in the local data center, instead, it keeps complaining "Not enough replicas available for a query at consistency LOCAL_QUORUM (2 required but only 1 alive". Any advice will be appreciated.


Maven dependencies:

org.springframework.boot:spring-boot:jar:2.3.7.RELEASE
org.springframework.boot:spring-boot-starter-data-cassandra:jar:2.3.7.RELEASE
org.springframework.data:spring-data-cassandra:jar:3.0.6.RELEASE
com.datastax.oss:java-driver-core:4.10.0

YAML:

spring:
  data:
    cassandra:
      consistency-level: local_quorum
      serial-consistency-level: LOCAL_SERIAL
      local-datacenter: DC1
      other configs: ...

Code Snippet :

public class CassandraConfig {

    @Bean
    public CqlSessionBuilderCustomizer sessionBuilderConfigurer() {
        return cqlSessionBuilder ->
                cqlSessionBuilder
                        .withAuthCredentials("username", "pwd");
    }

    @Bean
    public DriverConfigLoaderBuilderCustomizer driverConfigLoaderBuilderCustomizer() {
        return loaderBuilder -> loaderBuilder
                .withDuration(DefaultDriverOption.REQUEST_TIMEOUT, Duration.ofMillis(10000))
                .withBoolean(DefaultDriverOption.LOAD_BALANCING_DC_FAILOVER_ALLOW_FOR_LOCAL_CONSISTENCY_LEVELS, true)
                .withInt(DefaultDriverOption.LOAD_BALANCING_DC_FAILOVER_MAX_NODES_PER_REMOTE_DC, 3);
    }
}
java driver
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered set1984 edited

This is the expected behaviour for the driver. As long as there are nodes available in the local DC, the driver will continue to communicate with that DC.

It is only when there are no nodes available in the local DC will the driver attempt to contact nodes in the remote DC.

When remote DCs are allowed by the load-balancing policy, the query plan (list of nodes to contact) will contain nodes in the local DC first with nodes in the remote DC added to the end. This means that the nodes in the local DC will be contacted first. Remote nodes will only be contacted when the list of local nodes have been exhausted.

For details, see the Java driver Load balancing doc. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

set1984 avatar image set1984 commented ·

Hi Erick, thank you for sharing your insights. Does that mean I need to create a custom RetryPolicy and overwrite onUnavailable method, so that the driver will move to the remote nodes in the query plan after the only alive local node is unable to handle the write/read request?

0 Likes 0 ·
set1984 avatar image set1984 commented ·

Will the local_quorum config prevent the driver from moving to remote DC nodes?

For example, both local and remote DCs have 3 nodes in each. Two local nodes are down. I assume the query plan looks like the list below

#1: Local_Node_1 (UP)

#2: Remote_Node_1 (UP)

#3: Remote_Node_2 (UP)

#4: Remote_Node_3 (UP)

#5: Local_Node_2 (DOWN/IGNORED)

#6: Local_Node_3 (DOWN/IGNORED)


Will the driver move to #2 successfully or get stuck at #1 as consistency-level is local_quorum and there are no enough local replicas?

0 Likes 0 ·