question

a.gheshlaghy_177282 avatar image
a.gheshlaghy_177282 asked Erick Ramirez answered

What does "alternate between RF settings" mean when configuring allocate_tokens_for_replication_factor?

Hi,

to use allocation algorithm in dse i have to use allocate_tokens_for_replication_factor parameter but how can i have two keyspace with different RF with this parameter?
in docs i found below:
"If the replication varies,alternate between the replication factor (RF) settings."
but i could not get it.


virtual nodes
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Background

For the benefit of other community members, the quote came from the Virtual node configuration page. Here is an extract from the DSE 6.8 version of the page:

To use the allocation algorithm uncomment allocate_tokens_for_local_replication_factor and set it to the target replication factor for the keyspaces in the datacenter. If the replication varies, alternate between the replication factor (RF) settings.

Clarification

This comment only applies to situations where multiple nodes are being added to a data centre. Here is some additional guidance depending on the keyspace configuration and situation.

NOTE: For the purposes of this discussion, only the configuration of the application keyspaces are relevant and we will ignore configuration of system keyspaces.

ALL APP KEYSPACES SET TO RF=3

Set allocate_tokens_for_local_replication_factor: 3 in cassandra.yaml.

In most cases, it will be set to 3 since the recommended replication factor is 3 for each DC.

ADDING ONE NODE TO A DC

If there are keyspaces with varying replication factors such as the examples below, use the replication factor of the keyspace with the most data.

Keyspace Replication factor Data size
community_ks 3 2.0 TB
playlist_ks 5 1.0 TB
recommender_ks 1 1.5 TB

In the example above, community_ks has the most data so use it's replication factor and set allocate_tokens_for_local_replication_factor: 3.

KEYSPACES WITH EQUAL DATA SIZE

If there are keyspaces with varying replication factors such as the examples below AND with equal data size, use the highest replication factor.

Keyspace Replication factor Data size
community_ks 3 2.0 TB
playlist_ks 5 2.0 TB
recommender_ks 1 2.0 TB

In the example above, playlist_ks has the highest RF so use it's replication factor and set allocate_tokens_for_local_replication_factor: 5.

ADDING MULTIPLE NODES TO A DC

Similar to above where:

  • keyspaces have varying replication factors, and
  • equal data size

but this time when adding multiple nodes, alternate between the different replication factors. For example:

  • on first new node, set allocate_tokens_for_local_replication_factor: 3
  • on second new node, allocate_tokens_for_local_replication_factor: 5
  • on third new node, set allocate_tokens_for_local_replication_factor: 3
  • on fourth new node, allocate_tokens_for_local_replication_factor: 5
  • and so on.

IMPORTANT: When adding multiple nodes to the cluster using the allocation algorithm, ensure that nodes are added one at a time. We recommend a 2-minute gap between nodes. If nodes are added concurrently, the algorithm assigns the same tokens to different nodes.

Thanks for bringing this to our attention. I can see how this can be confusing. I will request for our official documentation to be updated. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.