question

Davis@DBA avatar image
Davis@DBA asked Erick Ramirez answered

How do I configure one IP to be used by my application to connect to the Cassandra cluster?

How to configure one IP to be used by application to connect any of my nodes in cassandra cluster? Do i need to use same IP in broadcast_rpc_address in all the nodes? Please suggest

for example, if I have 6 nodes in my cluster, How can i configure my application to connect to any of the node at any point of time

driverload balancing
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered smadhavan edited

@Davis@DBA, I'm going to assume using DataStax Java Driver here, but it'd be similar for other drivers too. In the contact points, as long as if you provide only one Host:Port combination, the driver will automatically fetch the entire topology (DCs, nodes, etc.,) of the cluster, but that is not sufficient because if anytime during the connection establishment, if that one host/node is not up & running, there is no way for the client (using the driver) to establish the connection the cluster and hence we recommend providing at least up to a 3-nodes (sitting in different racks/availability zones) for high availability. You could read more about the address translation and connection information in the driver documentation. See here for additional info on broadcast address.

6 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Davis@DBA avatar image Davis@DBA commented ·

If I have 6 nodes, Do i need to specify all 6 host:port in my application connection string?

If I'm using only one node IP in my application connection string, and if that host is completely down, then, how my application make connections to other nodes?

0 Likes 0 ·
smadhavan avatar image smadhavan ♦ Davis@DBA commented ·

You don't have to list all nodes from your cluster. I would simply go with at least 2 nodes sitting on different rack/availability zone per datacenter in the contact points for achieving high availability.


If you're using only one node's IP address in the contact points of the client application driver configuration and if that node is down, there is no way the client can establish initial connectivity to the cluster and this is the exact same reason why we recommend providing one or two additional nodes, specifically in another rack/AZ for high availability.


Let's assume a different case here. If the client application leveraging driver has already established connectivity to the cluster, and if one of the node goes down, it would automatically figure that out and leverage the existing up & running nodes because during the first initialization, it would have already known the complete cluster topology.

0 Likes 0 ·
Davis@DBA avatar image Davis@DBA commented ·

So the recommended way is i need to specify two hosts from different datacenters. ex:hostA:port,hostX:port. where A and X are nodes from different datacenters. is my understanding correct?


0 Likes 0 ·
smadhavan avatar image smadhavan ♦ Davis@DBA commented ·
  • For a single datacenter cluster, <dc1-host1-rackaz1>:port,<dc1-host2-rackaz2>:port
  • For a two datacenter cluster, <dc1-host1-rackaz1>:port,<dc1-host2-rackaz2>:port,<dc2-host1-rackaz1>:port,<dc2-host2-rackaz2>:port


When leveraging configuration file like application.conf, you will provide it as below for the single-datacenter scenario:


See here for more info.

datastax-java-driver {
  basic {
    contact-points = [ "<dc1-host1-rackaz1>:9042", "<dc1-host2-rackaz2>:9042" ]
    load-balancing-policy.local-datacenter = dc1
  }
}

dc1 being the closest to where your application is deployed.

0 Likes 0 ·
Davis@DBA avatar image Davis@DBA commented ·

I have multiple datacenter and one rack in each DC.

<dc1-host1-rackaz1>:port,<dc2-host2-rackaz2>:port

Is this correct?

0 Likes 0 ·
smadhavan avatar image smadhavan ♦ Davis@DBA commented ·

No, this is wrong because what if that only node/host that you've provided per DC goes for a toss? We need to provide at least one host per physical rack. Since you've only one rack (not sure if that is physical/logical), you're already loosing high availability there. Good luck with it!


Please see the prior comments for the multi-DC scenario example. Also, please use comments to post comments as these are not answers. Thanks!

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

The Cassandra drivers use a built-in load-balancing policy and is aware of the cluster topology plus the health of the nodes. For each read or write request, the load balancing policy (LBP) computes what's called a query plan which determines:

  • the nodes the driver will communicate with,
  • which coordinator to pick and which nodes to use as failover.

The list of nodes in the query plan is different for each query so the load is balanced across nodes in the cluster and only contains available nodes -- nodes which are down or unresponsive are not included in the query plan. For more info, see the Java driver document on Load balancing.

The driver knows about the nodes in the cluster because it connects to contact points (the list of node IP addresses you've configured in your app) to establish a control connection at startup time. The driver uses the control connection to perform tasks that include querying the system tables to learn about the cluster topology. Using the control connection, the driver also listens for changes to the cluster automatically so it is aware of things like node additions, node outages, new data centres and decommissions in real time.

For high availability, we recommend that you configure:

  • at least two nodes in the DC "local" to the app
  • one node from each Cassandra rack if the local DC has multiple
  • specify one node from each of the remote DCs (optional)

Generally, this is the same method for picking nodes to configure as seeds. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.