Ryan Quey avatar image
Ryan Quey asked Erick Ramirez edited

How is data modeling affected when Cassandra is integrated with Datastax Search?

Everyone always talks about how important data modeling is for Cassandra, and rightfully so. However, do any data modeling principles change when we have DSE search integrated, due to the Solr indices and so on?

data modelingsearch
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

smadhavan avatar image
smadhavan answered smadhavan commented

@Ryan Quey, in short, data modeling becomes flexible with DSE Search indices in the equation/mix. For instance, take a look at this example: Without the search indices, one might have to design a table-per-query approach to satisfy their access patterns and that becomes easier with DSE Search indices in the mix which provides the needed flexibility by having the base table and leveraging the Search index for easily querying the table with non-primary key columns. Further resources for reading are provided below for your reference,

Hope that helps to begin exploring with DSE Search and data modeling!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Ryan Quey avatar image Ryan Quey commented ·

So would you say we can totally forget about trying to limit our searches to only hit one partition, or is that still a valid rule?

Also, does it mean we don't have to be as careful in picking our primary key?

0 Likes 0 ·
smadhavan avatar image smadhavan ♦ Ryan Quey commented ·

@Ryan Quey,

When doing a read in DSE Search, it happens in a two-pass operation — in the first pass, the shard router determines the minimum number of nodes it can query while still being able to cover the entire token range with at least one node; when it figures out which nodes to query, it then runs queries against the search indexes (not the C* tables) on those nodes and gets back a result set (from each) with ID values and the rank (relevancy score) of each record — in the second pass, it determines which of the records it wants to return as actual results to the client; it then makes CQL queries by ID values for the appropriate records, get these results, and packages them up for the client. We want the shard router to have to talk to as few nodes as possible.

Having partition key part of the query will significantly improve performance and primary key should uniquely identify a row in the table and that doesn't change here with DSE Search index in the mix.

1 Like 1 ·