Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Cedrick Lunven avatar image
Cedrick Lunven asked Erick Ramirez edited

How do I reconfigure a node that has been added in the wrong rack and DC?

Hi folks,

Not really an expert in the troubleshooting nodes I ask here for wider audience and support.

Nigel: I've made a mistake in configuring node 3 while doing DS201. I failed to give it the right rack (hakuna-matata) and data center (west-side).
So it's causing all kinds of problems in downstream exercises. So I tried to fix it by changing the data center parameters, but on trying to start it:
- It said it couldn't start the node with a new data center name that didn't match the old one (cassandra).
- It said I must decommission and re-bootstrap this node.
So I've been trying for ages to accomplish that, but have just been getting myself further and further down the rabbit-hole, and succeeded only in removing a node I wanted to keep - whatever I try, the wrong one, call it node 3 or whatever, is still there.
You can see below, the output from what I've been trying.

What is the correct way to decommission, destroy and remove the node at 127.0.0.3?
ubuntu@ds201-node1:~/node3/bin$ nodetool status 
Datacenter: Cassandra 
===================== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
DN 127.0.0.3 315.47 KiB 128 ? b84ac6a8-2895-4f2e-bfc8-8478df2bcebc rack1 
Datacenter: east-side 
===================== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
UN 127.0.0.2 276.34 KiB 128 ? 99ec4273-7960-40fe-ae3a-574c9981138f hakuna-matata 
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless 
ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -h 127.0.0.3 
nodetool: Found unexpected parameters: [-h, 127.0.0.3] [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 decommission 
nodetool: Unsupported operation: local node is not a member of the token ring yet [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 
nodetool: Required parameters are missing: remove_operation 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode force 
RemovalStatus: No token removals in process. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 3 
nodetool: Invalid UUID string: 3 [...]

Looking at documentation https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/tools/nodetool/toolsDecommission.html i would have think the good syntax is indeed.

nodetool -h 127.0.0.3 decommission

Now as the error is `local node is not a member of the token` i would have wait or force node to join ? try nodetool rebuild?

nodetool rebuild

Let's chat here how to help

troubleshooting
8 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I tried:

nodetool -h 127.0.0.3 rebuild east-side

and got:

nodetool -h 127.0.0.3 rebuild east-side
finished rebuild for (All keyspaces), (All tokens), 1 streaming connections, NORMAL,  included DCs: east-side after 0 seconds receiving 0 bytes.

But the output from 'nodetool status' is unchanged. I tried running rebuild on node 1:

nodetool -h 127.0.0.1 rebuild west-side
nodetool: DC 'west-side' is not a known DC in this cluster

I'm thinking I could save time by just destroying the whole VM and starting again, but that would not teach me anything .....

0 Likes 0 ·

@nyjdams_136971 I've posted an answer that explains why you had issues running the commands. I've also posted a step-by-step recovery plan. Cheers!

P.S. I've converted your post to a comment since it's not an "answer". :)

0 Likes 0 ·

Oh, ok, thanks Erick :)

The reason why node 1 isn't part of the cluster is because when I ran:

node3/bin/nodetool decommission

It removed node1. Can you explain why it did that please?

0 Likes 0 ·
Show more comments

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

Troubleshooting

There's something really wrong here. Assuming the following:

  • node1 has IP 127.0.0.1
  • node2 has IP 127.0.0.2
  • node3 has IP 127.0.0.3

The nodetool status output posted above indicates that only nodes 127.0.0.2 and 127.0.0.3 belong in the cluster. Node 127.0.0.1 is not a member of that cluster.

ISSUE 1 - This exception is due to node1 not being part of the cluster so it can not decommission itself:

ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 

ISSUE 2 - This exception was thrown because it isn't the right format:

ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -h 127.0.0.3 
nodetool: Found unexpected parameters: [-h, 127.0.0.3] [...] 

The correct format is:

$ nodetool [options] <command>

ISSUE 3 - This exception is similar to that of issue 1:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 decommission 
nodetool: Unsupported operation: local node is not a member of the token ring yet [...] 

node1 can not decommission node 127.0.0.3 because node1 is not part of the cluster.

ISSUE 4 - These exceptions are similar to issue 3:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 
nodetool: Required parameters are missing: remove_operation 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode force 
RemovalStatus: No token removals in process. [...]

node1 can not remove node 127.0.0.3 because node1 is not part of the cluster.

ISSUE 5 - This exception is due to incorrect format:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 3 
nodetool: Invalid UUID string: 3 [...]

The correct format is:

$ nodetool removenode <host_id>

In any case, it still would not work because node1 is not part of the cluster.

Recovery plan

STEP 1 - Remove node 127.0.0.3 from the cluster by running the following command:

$ nodetool -h 127.0.0.2 -p 7299 removenode b84ac6a8-2895-4f2e-bfc8-8478df2bcebc

STEP 2 - On node 127.0.0.3, delete the contents of the following directories:

  • data/
  • commitlog/
  • saved_caches/

STEP 3 - On node 127.0.0.3, set the seeds list to 127.0.0.2 in cassandra.yaml.

STEP 4 - On node 127.0.0.3, configure the DC name and rack (as per exercise requirements).

STEP 5 - On node 127.0.0.3, start DSE.

At this point, both node2 and node3 should be part of the same cluster.

STEP 6 - Rebuild node1. First step is stop DSE on the node.

STEP 7 - On node 127.0.0.1, delete the contents of the following directories:

  • data/
  • commitlog/
  • saved_caches/

STEP 8 - On node 127.0.0.1, set the seeds list to 127.0.0.2 in cassandra.yaml.

STEP 9 - On node 127.0.0.1, configure the DC name and rack (as per exercise requirements).

STEP 10 - On node 127.0.0.1, confirm that cluster_name in cassandra.yaml is the same as the other 2 nodes.

STEP 11 - On node 127.0.0.1, start DSE.

At this point, all 3 nodes should be part of the same cluster in their respective DCs and racks.

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Well, thanks Erick - but when I ran step 1, this is what I got:

error: null-- StackTrace --java.lang.NullPointerException    at org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removalCoordinator(VersionedValue.java:218)    at org.apache.cassandra.gms.Gossiper.advertiseRemoving(Gossiper.java:651)    at org.apache.cassandra.service.StorageService.removeNode(StorageService.java:4723)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at 
.
.
.

plus much more which exceeds the limit on characters I can post

0 Likes 0 ·

My bad. It is connecting to node 1/localhost. Try:

$ nodetool -h 127.0.0.2 removenode b84ac6a8-2895-4f2e-bfc8-8478df2bcebc
0 Likes 0 ·

Thanks for your help :)

0 Likes 0 ·