PLANNED MAINTENANCE

Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


question

brent.hale_101199 avatar image
brent.hale_101199 asked ·

Help me understand Solr caches and DSE's Search modifications

We're trying to optimize our Search queries. We're heading down the path of using "Filter Queries" (fq). In playing with it, we see some great improvements. But too often it seems like the cache is being cleared/regenerated.

So we would like to understand DSE's Search modifications relating to caches. In the docs (

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/search/searchIndexConfig.html?hl=filtercachehighwatermark

it seems that you manage all of the caches as one group instead of "per segment". The Solr docs talk about 3 main caches (is that what you mean by 'segment') and how to optimize each for use with the 'fq' parser.

Does the suggestions for Solr caches hold true inside of DSE?

It seems that I can only view statistics on the one dseFilterCache. I was expecting to be able to view each of the 3 Solr caches individually.

searchcache
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@brent.hale_101199 just acknowledging your question and getting a definitive answer for you. Cheers!

0 Likes 0 · ·

1 Answer

maedhroz avatar image
maedhroz answered ·

The three caches you are most likely to interact w/ in Solr are the field cache, the document cache, and the filter cache. The field cache has been made more or less obsolete by docValues, and the document cache is disabled explicitly in DSE Search, since field values (with a couple esoteric exceptions) come from a backing Cassandra table and not Lucene. What about the filter cache?


OSS Solr's filter cache is attached to the active index searcher. When a searcher is replaced (for instance, on soft commit), the cache is cleared along with it. DSE Search uses its own filter cache implementation (called SolrFilterCache) that manages cached filters on a Lucene segment level. When a new searcher is opened on soft-commit, those segment-specific filters are preserved, and new filters must only be built for newly flushed segments.


Having said that, it's still possible to churn even the segment filter cache more than you would like. The first reason this might happen is the queries themselves. If you're going to use fq, you should be using it for things that are heavily reused and expensive to calculate. A common example here, in a retail context, would be something like a product category. Assuming reasonable query selection, it's also possible that the filter cache is simply too small to fit the desired set of filters (See https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/search/searchIndexConfig.html).


Finally, DSE Search uses the filter cache in distributed queries to limit sub-queries to individual shards/nodes to particular Cassandra token ranges, given that its Solr cores are not physically separated along those boundaries. This has, over the course of the product's lifetime, led to filter cache pollution when combined with vnodes, especially on larger clusters. (vnodes translate to more token ranges, and this means more possible token range filters.) To address this, we reworked the algorithm that selects shards during distributed queries in DSE 5.1.15, 6.0.8, and 6.7.4, introducing the STATIC set cover finder, which improves query balance across nodes and reduces the total number of token range filters that might pollute the cache by relying on client-level load balancing virtually guaranteed to already be in place (see https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/search/searchLoadBalancing.html). Essentially, if you use one of those versions or later, you should be using the STATIC set cover finder.


For general reference material on how to monitor the filter cache, see https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/mgmtServices/searchPerformance/filterCacheStatistics.html.


I hope that helps shed some light on the situation.

17 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I'm struggling to understand much of this response. Is the "segment filter cache" part of the "dseFilterCache"? If not, how do we get visibility on that cache? Also, concerning STATIC set cover finder, we are using 4 vnodes, so that forces us to continue using DYNAMIC, correct?

0 Likes 0 · ·

"dseFilterCache" is a Solr core-level view of the aggregate per-segment filter cache. If you are using 4 vnodes and are on DSE 5.1.15, 6.0.8, 6.7.4, or later, we recommend the STATIC set cover finder.

0 Likes 0 · ·

Thanks. https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/search/searchLoadBalancing.html implies that STATIC requires 8+ vnodes. Does it work with only 4?

0 Likes 0 · ·
Show more comments
Show more comments

Thank you for the response.

You mentioned that caches are regenerated based upon soft-commits. We have new documents being ingested continually throughout the day. Do we get a lot of churn because of it? Is there is a way to tune that?

Is there a way to monitor how often new searchers are being created.

Does 5.1.15 default to using the STATIC set cover finder? We only have 4 vnodes.

0 Likes 0 · ·

DSE Search does not regenerate segment filter caches as OSS Solr does.


A new searcher should be created as often as you soft commit, which is configurable in your solrconfig.yaml and via https://docs.datastax.com/en/dse/6.0/cql/cql/cql_reference/cql_commands/cqlAlterSearchIndexConfig.html


DSE 5.1.15 still defaults to DYNAMIC. STATIC must be enabled explicitly.

0 Likes 0 · ·