Hello Expers, I have a high cpu spike and I have isolated it to one spark job that writes to table that's anomalous compared to other tables. Significantly higher SSTables, data and keys
High CPU spiek due to table that has 500 plus SSTables, 100 TerraBytes of data, and 300 million keys
I would highly appreciate any sugegstions how to tackle this. Is this compaction problem due to which SSTables are growing to that high a number?
Table details based on "nodetool cfstats output"
=======================================
SSTable count: 547
Space used (live): 91768723569
Space used (total): 91768723569
Space used by snapshots (total): 0
Off heap memory used (total): 422576163
SSTable Compression Ratio: 0.3509667199568188
Number of keys (estimate): 299551255
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 40540
Local read count: 359
Local read latency: NaN ms
Local write count: 1127675487
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 150335880
Bloom filter off heap memory used: 150331504
Index summary off heap memory used: 251556251
Compression metadata off heap memory used: 20688408
Compacted partition minimum bytes: 180
Compacted partition maximum bytes: 5839588
Compacted partition mean bytes: 621
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Other hardware/configuration details
The jobs write output to a 20 node cassandra cluster.
Each cassandra node is a r5.2xlarge
Memory: 64GBStorage: 2000 GBiops: 6000
replication factor 3
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 2678400
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';