DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

rzilkha_129571 avatar image
rzilkha_129571 asked ·

What can cause a long start time for Cassandra?

Hi,

If one has a large number of keyspaces, I assume this may result in many SStables, which may be the reason why the time it takes quite a while for Cassandra to start.

Correct me if I'm wrong, but it appears that Cassandra goes over every SStable and reads its metadata so that the reason may be the large proportion of files.

Is there a way to set old keyspace as archived keyspaces that are only loaded lazily?

cassandra
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

The number of keyspaces is related to the long startup but ultimately the number of tables is the biggest contributor to the duration of Cassandra's startup sequence.

If you inspect the system.log closely, you will likely see that most of the time is spent between these 2 log entries:

INFO  [main] 2020-07-31 00:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding
INFO  [main] 2020-07-31 00:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop

These entries were taken from a node running Cassandra 3.11.3.

Between waiting for gossip to settle (message from Gossiper.java class above) and the native transport service initialising (message from NativeTransportService.java), the classes CompactionStrategyManager and DiskBoundaryManager are doing most of the work during the initialisation process.

You can see this for yourself if you go through the debug.log. In fact, you will see repeated message entries indicating that C* is updating the disk boundaries. You can conclude from this that the length of time it takes for C* to initialise is proportional to the total number of tables in the cluster (regardless of how many keyspaces there are).

The CassandraDaemon.setup() method deals with initialising Cassandra. This section of the code (in C* 3.11.3) are the steps I refer to above if you're interested in the details of the startup sequence.

To answer your other question, there isn't a way of selecting which keyspaces/tables get loaded on startup. Cassandra needs to pre-load the data on disk so it knows what it owns. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.