Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

rb2017houser_184626 avatar image
rb2017houser_184626 asked ·

Where does Apache Cassandra store bloom filters?

My request is for information that describes the Apache Cassandra "Bloom Filter structure" and where the "Bloom Filter structure" resides in an Apache Cassandra configuration.

cassandrabloom filter
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

pacorulo avatar image
pacorulo answered ·

Hi @Erick Ramirez. I know maybe it is not the place but instead of opening a new discussion I would like to point something about Bloom filters. In the learning path, on DSE201 and on step 'Read path' it is asked on the quiz "which of the following structures reside on disk? Check all that apply" and where the choices are: key cache, sstable, bloom filter, partition summary, partition index", but in the correct answer you need to select also bloom filter. Bloom filters reside on disk but they are put into memory (thanks for telling us it is on off-heap) and the reads use this in-memory bloom filter and not the one that reside on disk, they exist to just no recalculate them on every node restart (C* instance, I mean). So, I think the question is ambiguous as maybe it could be something like "whick of the following structures reside on disk for read purposes?", although I think that is not a good idea as could be confusing... but I think it is also confusing to say that bloom filters reside on memory when they are also stored on disk but not used for any client read. Thanks and good day.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@rb2017houser_184626 Each SSTable file has an associated bloom filter component file with the suffix *-Filter.db. When an SSTable is opened at startup, the bloom filter data structure is held in off-heap memory and gets read as part of the Cassandra read path.

Each table's bloom filter size (in memory) is configurable by setting the bloom_filter_fp_chance property in the table's schema.

For more info, see the following documents:

Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.