Everyone always talks about how important data modeling is for Cassandra, and rightfully so. However, do any data modeling principles change when we have DSE search integrated, due to the Solr indices and so on?
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
Everyone always talks about how important data modeling is for Cassandra, and rightfully so. However, do any data modeling principles change when we have DSE search integrated, due to the Solr indices and so on?
@Ryan Quey, in short, data modeling becomes flexible with DSE Search indices in the equation/mix. For instance, take a look at this example: Without the search indices, one might have to design a table-per-query approach to satisfy their access patterns and that becomes easier with DSE Search indices in the mix which provides the needed flexibility by having the base table and leveraging the Search index for easily querying the table with non-primary key columns. Further resources for reading are provided below for your reference,
Hope that helps to begin exploring with DSE Search and data modeling!
So would you say we can totally forget about trying to limit our searches to only hit one partition, or is that still a valid rule?
Also, does it mean we don't have to be as careful in picking our primary key?
When doing a read in DSE Search, it happens in a two-pass operation — in the first pass, the shard router determines the minimum number of nodes it can query while still being able to cover the entire token range with at least one node; when it figures out which nodes to query, it then runs queries against the search indexes (not the C* tables) on those nodes and gets back a result set (from each) with ID values and the rank (relevancy score) of each record — in the second pass, it determines which of the records it wants to return as actual results to the client; it then makes CQL queries by ID values for the appropriate records, get these results, and packages them up for the client. We want the shard router to have to talk to as few nodes as possible.
Having partition key part of the query will significantly improve performance and primary key should uniquely identify a row in the table and that doesn't change here with DSE Search index in the mix.
6 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2021 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use