azim_91_184236 avatar image
azim_91_184236 asked Erick Ramirez commented

How do I check the progress/status of nodetool rebuild?

In the process of migrating data from OnPrem to cloud, using this approach

I am trying to find a reliable way to check the progress and status of replication/streaming (initiated via nodetool rebuild command), with Apache Cassandra. I have seen a few suggestions , such as -

1. Monitor remaining stream via 'nodetool netstats | grep -v 100%' .

2. Run nodetool status command and check the 'Load' column - what should I expect to see?

3. Cassandra logs

Any suggestions on -

1. The options (and recommended way) to monitor the progress of replication/streaming?

2. How to reliably determine that the replication/streaming initiated by 'nodetool rebuild' command on a specific node is complete?

Appreciate any pointers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented


The data size (load) in the nodetool status output is a rough indication of how much more there is to go with the streams. For example, if other nodes in the cluster have about 200GB then it gives you an idea of what to expect although this just provides a rough estimate.

Network statistics

The nodetool netstats output is a more reliable way of monitoring the streams since it clearly shows how much data is getting sent to each stream and a running percentage.


In the logs, an entry like this indicates the start of the rebuild operation:

INFO  [RMI TCP Connection(2)-] 2020-07-07 18:43:02,444 - rebuild from dc: (DC2), (All keyspaces), (All tokens)

An entry like this shows the start of the stream with the corresponding session ID:

INFO  [RMI TCP Connection(2)-] 2020-07-07 18:43:02,493 - [Stream #de564b00-c02d-11ea-9fec-557bda1bdef2] Executing streaming plan for Rebuild

Monitor the progress of the stream by keeping an eye on the log entries with this session ID. Note that depending on the number of keyspaces/tables in the cluster, there will be multiple streams and session IDs.

As each stream session completes, it will be marked with an entry like this in the logs:

INFO  [StreamReceiveTask:1] 2020-07-07 18:43:02,695 - [Stream #de564b00-c02d-11ea-9fec-557bda1bdef2] Session with / is complete

When all the sessions for the rebuild has completed, it will be marked with an entry like this:

INFO  [StreamReceiveTask:1] 2020-07-07 18:43:02,710 - [Stream #de564b00-c02d-11ea-9fec-557bda1bdef2] All sessions completed

By the time this entry gets logged, you should also see the rebuild command complete. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

azim_91_184236 avatar image azim_91_184236 commented ·

Thank you @Erick Ramirez! all the above commands need to be run and logs need to be reviewed on each new node where we run nodetool rebuild, correct? Is there anything we can or should monitor on the source Datacenter node(s)?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ azim_91_184236 commented ·

Correct, on the nodes where you're running the rebuild. Nothing specific to monitor for the source DC. Cheers!

0 Likes 0 ·