Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

biswalkiran_184127 avatar image
biswalkiran_184127 asked Erick Ramirez commented

How do I tackle high CPU spike?

Hello Expers, I have a high cpu spike and I have isolated it to one spark job that writes to table that's anomalous compared to other tables. Significantly higher SSTables, data and keys

High CPU spiek due to table that has 500 plus SSTables, 100 TerraBytes of data, and 300 million keys

I would highly appreciate any sugegstions how to tackle this. Is this compaction problem due to which SSTables are growing to that high a number?

Table details based on "nodetool cfstats output"

=======================================

SSTable count: 547

Space used (live): 91768723569

Space used (total): 91768723569

Space used by snapshots (total): 0

Off heap memory used (total): 422576163

SSTable Compression Ratio: 0.3509667199568188

Number of keys (estimate): 299551255

Memtable cell count: 0

Memtable data size: 0

Memtable off heap memory used: 0

Memtable switch count: 40540

Local read count: 359

Local read latency: NaN ms

Local write count: 1127675487

Local write latency: NaN ms

Pending flushes: 0

Bloom filter false positives: 0

Bloom filter false ratio: 0.00000

Bloom filter space used: 150335880

Bloom filter off heap memory used: 150331504

Index summary off heap memory used: 251556251

Compression metadata off heap memory used: 20688408

Compacted partition minimum bytes: 180

Compacted partition maximum bytes: 5839588

Compacted partition mean bytes: 621

Average live cells per slice (last five minutes): NaN

Maximum live cells per slice (last five minutes): 0

Average tombstones per slice (last five minutes): NaN

Maximum tombstones per slice (last five minutes): 0

Other hardware/configuration details

The jobs write output to a 20 node cassandra cluster.

Each cassandra node is a r5.2xlarge

Memory: 64GBStorage: 2000 GBiops: 6000

replication factor 3

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 2678400

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

performance
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The two most common causes of high CPU utilisation on nodes are:

  • high GC, and/or
  • high IO waits.

In both cases, they are symptoms of overloaded nodes.

The general recommendation to maximise throughput on nodes is to use NVMe SSDs which have very high IOPS throughput. Otherwise, we recommend using volumes which have at least 10K IOPS, plus mount the data and commitlog directories on separate volumes so they are not competing for the same IO bandwidth.

If you're seeing high IO waits when nodes are getting hit with lots of writes, it's an indication that the commitlog disk cannot keep up.

In situations where nodes are overloaded, we recommend that you add more nodes to increase the capacity of the cluster by spreading the load across more nodes. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hello @Erick Ramirez


Thanks a lot for your response. So having 500 SSTables is normal? After compaction do total count of SSTables decrease/increase or stay the same? Is count of SSTables related to how we choose partition keys?

0 Likes 0 ·

No, it doesn't have anything to do with the SSTable count. Cheers!

0 Likes 0 ·