Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

yariv.amar_82168 avatar image
yariv.amar_82168 asked ·

Are the Summary.db component files required to load data with sstableloader?

hi
i'm using sstableloader to load data to C*. the sstableloader fails during the upload, but with this error:

java.lang.RuntimeException: Failed to list files in /mnt/migration/...mySourceFolder
:
Caused by: java.lang.AssertionError
        at org.apache.cassandra.io.sstable.IndexSummary.<init>(IndexSummary.java:86)
        at org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.deserialize(IndexSummary.java:350)
        at org.apache.cassandra.io.sstable.format.SSTableReader.loadSummary(SSTableReader.java:905)


after some reading, i've decided to remove the mc-2-big-Summary.db from the source-folder, the loader completed successfully.

question:

1. what is the role of summary.db during sstable loader?

2. is it safe to remove summary.db from the source folder?


Thanks!

sstableloader
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@yariv.amar_82168 Yuki already explained that the Summary.db component of the SSTable set is not mandatory to be able to load SSTables with sstableloader. However, it's not a good idea to remove component files.

Our recommendation is to always keep all the SSTable component files together with the Data.db component as a set. There really is no good reason to exclude components from the set. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I agree with the recommendation, and i've been doing this export/import many times before. i don't know why the `Summary.db` that time cause the sstableloader to fail loading. that's the only reason to exclude it.

i assume that sstableloader is streaming the data from the source folder into C* cluster, and not just copying files.


will be happy to learn if there is any information i can look for to understand the root cause.


thank you for the help.

0 Likes 0 · ·

Right. I see what you mean now. Cheers!

0 Likes 0 · ·
yukim avatar image
yukim answered ·

Hi,

> 1. what is the role of summary.db during sstable loader?

The contents of Summary.db is the sampling from SSTable's index file (Index.db). Cassandra uses this summary to speed up looking up for the key in index file.

https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntHowDataWritten.html#dbIntHowDataWritten__sstsummary

> 2. is it safe to remove summary.db from the source folder?

Summary.db file is not essential for reading SSTable data. Again, it is there to speed up reading data. Cassandra can recreate the file from Index.db.

You can safely delete it if it is causing trouble.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.