question

Cedrick Lunven avatar image
Cedrick Lunven asked Erick Ramirez edited

How do I reconfigure a node that has been added in the wrong rack and DC?

Hi folks,

Not really an expert in the troubleshooting nodes I ask here for wider audience and support.

Nigel: I've made a mistake in configuring node 3 while doing DS201. I failed to give it the right rack (hakuna-matata) and data center (west-side).
So it's causing all kinds of problems in downstream exercises. So I tried to fix it by changing the data center parameters, but on trying to start it:
- It said it couldn't start the node with a new data center name that didn't match the old one (cassandra).
- It said I must decommission and re-bootstrap this node.
So I've been trying for ages to accomplish that, but have just been getting myself further and further down the rabbit-hole, and succeeded only in removing a node I wanted to keep - whatever I try, the wrong one, call it node 3 or whatever, is still there.
You can see below, the output from what I've been trying.

What is the correct way to decommission, destroy and remove the node at 127.0.0.3?
ubuntu@ds201-node1:~/node3/bin$ nodetool status 
Datacenter: Cassandra 
===================== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
DN 127.0.0.3 315.47 KiB 128 ? b84ac6a8-2895-4f2e-bfc8-8478df2bcebc rack1 
Datacenter: east-side 
===================== 
Status=Up/Down 
|/ State=Normal/Leaving/Joining/Moving 
-- Address Load Tokens Owns Host ID Rack 
UN 127.0.0.2 276.34 KiB 128 ? 99ec4273-7960-40fe-ae3a-574c9981138f hakuna-matata 
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless 
ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -h 127.0.0.3 
nodetool: Found unexpected parameters: [-h, 127.0.0.3] [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 decommission 
nodetool: Unsupported operation: local node is not a member of the token ring yet [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 
nodetool: Required parameters are missing: remove_operation 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode force 
RemovalStatus: No token removals in process. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 3 
nodetool: Invalid UUID string: 3 [...]

Looking at documentation https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/tools/nodetool/toolsDecommission.html i would have think the good syntax is indeed.

nodetool -h 127.0.0.3 decommission

Now as the error is `local node is not a member of the token` i would have wait or force node to join ? try nodetool rebuild?

nodetool rebuild

Let's chat here how to help

troubleshooting
8 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

nyjdams_136971 avatar image nyjdams_136971 commented ·

I tried:

nodetool -h 127.0.0.3 rebuild east-side

and got:

nodetool -h 127.0.0.3 rebuild east-side
finished rebuild for (All keyspaces), (All tokens), 1 streaming connections, NORMAL,  included DCs: east-side after 0 seconds receiving 0 bytes.

But the output from 'nodetool status' is unchanged. I tried running rebuild on node 1:

nodetool -h 127.0.0.1 rebuild west-side
nodetool: DC 'west-side' is not a known DC in this cluster

I'm thinking I could save time by just destroying the whole VM and starting again, but that would not teach me anything .....

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ nyjdams_136971 commented ·

@nyjdams_136971 I've posted an answer that explains why you had issues running the commands. I've also posted a step-by-step recovery plan. Cheers!

P.S. I've converted your post to a comment since it's not an "answer". :)

0 Likes 0 ·
nyjdams_136971 avatar image nyjdams_136971 Erick Ramirez ♦♦ commented ·

Oh, ok, thanks Erick :)

The reason why node 1 isn't part of the cluster is because when I ran:

node3/bin/nodetool decommission

It removed node1. Can you explain why it did that please?

0 Likes 0 ·
Show more comments

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

Troubleshooting

There's something really wrong here. Assuming the following:

  • node1 has IP 127.0.0.1
  • node2 has IP 127.0.0.2
  • node3 has IP 127.0.0.3

The nodetool status output posted above indicates that only nodes 127.0.0.2 and 127.0.0.3 belong in the cluster. Node 127.0.0.1 is not a member of that cluster.

ISSUE 1 - This exception is due to node1 not being part of the cluster so it can not decommission itself:

ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -f 
nodetool: Unsupported operation: local node is not a member of the token ring yet 
See 'nodetool help' or 'nodetool help <command>'. [...] 

ISSUE 2 - This exception was thrown because it isn't the right format:

ubuntu@ds201-node1:~/node3/bin$ nodetool decommission -h 127.0.0.3 
nodetool: Found unexpected parameters: [-h, 127.0.0.3] [...] 

The correct format is:

$ nodetool [options] <command>

ISSUE 3 - This exception is similar to that of issue 1:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 decommission 
nodetool: Unsupported operation: local node is not a member of the token ring yet [...] 

node1 can not decommission node 127.0.0.3 because node1 is not part of the cluster.

ISSUE 4 - These exceptions are similar to issue 3:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 
nodetool: Required parameters are missing: remove_operation 
See 'nodetool help' or 'nodetool help <command>'. [...] 
ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode force 
RemovalStatus: No token removals in process. [...]

node1 can not remove node 127.0.0.3 because node1 is not part of the cluster.

ISSUE 5 - This exception is due to incorrect format:

ubuntu@ds201-node1:~/node1/bin$ nodetool -h 127.0.0.3 removenode 3 
nodetool: Invalid UUID string: 3 [...]

The correct format is:

$ nodetool removenode <host_id>

In any case, it still would not work because node1 is not part of the cluster.

Recovery plan

STEP 1 - Remove node 127.0.0.3 from the cluster by running the following command:

$ nodetool -h 127.0.0.2 -p 7299 removenode b84ac6a8-2895-4f2e-bfc8-8478df2bcebc

STEP 2 - On node 127.0.0.3, delete the contents of the following directories:

  • data/
  • commitlog/
  • saved_caches/

STEP 3 - On node 127.0.0.3, set the seeds list to 127.0.0.2 in cassandra.yaml.

STEP 4 - On node 127.0.0.3, configure the DC name and rack (as per exercise requirements).

STEP 5 - On node 127.0.0.3, start DSE.

At this point, both node2 and node3 should be part of the same cluster.

STEP 6 - Rebuild node1. First step is stop DSE on the node.

STEP 7 - On node 127.0.0.1, delete the contents of the following directories:

  • data/
  • commitlog/
  • saved_caches/

STEP 8 - On node 127.0.0.1, set the seeds list to 127.0.0.2 in cassandra.yaml.

STEP 9 - On node 127.0.0.1, configure the DC name and rack (as per exercise requirements).

STEP 10 - On node 127.0.0.1, confirm that cluster_name in cassandra.yaml is the same as the other 2 nodes.

STEP 11 - On node 127.0.0.1, start DSE.

At this point, all 3 nodes should be part of the same cluster in their respective DCs and racks.

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

nyjdams_136971 avatar image nyjdams_136971 commented ·

Well, thanks Erick - but when I ran step 1, this is what I got:

error: null-- StackTrace --java.lang.NullPointerException    at org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removalCoordinator(VersionedValue.java:218)    at org.apache.cassandra.gms.Gossiper.advertiseRemoving(Gossiper.java:651)    at org.apache.cassandra.service.StorageService.removeNode(StorageService.java:4723)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at 
.
.
.

plus much more which exceeds the limit on characters I can post

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ nyjdams_136971 commented ·

My bad. It is connecting to node 1/localhost. Try:

$ nodetool -h 127.0.0.2 removenode b84ac6a8-2895-4f2e-bfc8-8478df2bcebc
0 Likes 0 ·
nyjdams_136971 avatar image nyjdams_136971 Erick Ramirez ♦♦ commented ·

Thanks for your help :)

0 Likes 0 ·