question

jean.andre_185656 avatar image
jean.andre_185656 asked jean.andre_185656 commented

How do we make DSE Search work with jUnit?

Related to the post 8412, the secondary index is clearly not working for our case, as the external_id field, its value may be the same from a company to another company. Then it takes to have the company_id as discriminant. Unfortunately, the secondary approach doesn't work for us. It allows only to have 1 field and we need two fields company_id + external_id.

1) As other approach we can concatenate the two fields (@hash( company_id + external_id) ) and store the value to a single column. This will make good the secondary index.

2) Under JUNIT test we add the statement to initialize the keyspace, then after creating the table, we add the statement below. It does not work and CQL is not available to understand the syntax.

So, how to make DSE Search working under JUNIT ?
-> CREATE SEARCH INDEX IF NOT EXISTS ON test;

-> error is: no viable alternative at input 'SEARCH' ([CREATE] SEARCH...)

This above statement works under DSE Studio but when it is time to make a select, it does not. We see under the tab Schema, the table and the Search Indexes. An index is created on each column of the table. Despite of that, we are not able to perform that query:

-> select * myspace.test where external_id = 'The-External-Id';

-> error: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

I read with page on DSE Search, seems very simple to use it.

And finally, I want to create a CUSTOM index, but SASI is not supported in PROD, so do you have other suggestion ?

Thank you for your assistance. Regards,

[UPDATE] for the point 1), I'm asking to my self if the problem is related the librairies version or if it needs to add a missing library. I tried to upgrade to version 4.9 but we cannot upgrade because the library 4.9 do not have the same Session object. Not backward compatible. It takes a CQLSession and we use the Session from com.datastax.driver.core.Session and I cannot upgrade all dependencies, then on this aspect I'm locked for this moment. But not sure if an upgrade will solve the problem. The project is using the following versions:

  • com.datastax.cassandra:cassandra-driver-core:3.10.2
  • com.datastax.cassandra:cassandra-driver-extras:3.10.2
  • org.springframework.boot:spring-boot-starter-data-cassandra:1.5.12
  • org.cassandraunit:cassandra-unit-shaded:3.11.2.0
  • org.cassandraunit:cassandra-unit-spring:3.11.2.0

To initialize the JUNIT, it is as following

@Slf4j
@RunWith(SpringRunner.class)
@SpringBootTest(classes = TestConfigurationPersistence.class)
public class Test {


    private final static String KEYSPACE_NAME = "my_test";

    // list of statements to execute for initializing the JUnit tests
    private final static List<String> CQL_INIT_TEST= Stream.concat(Stream.of("USE " + KEYSPACE_NAME + ";" ), CreateTestTable.getCQLStatements().stream()).collect(Collectors.toList());

    @Autowired
    private CassandraOperations cassandraTemplate;

    @Rule
    public CassandraCQLUnit cassandraCQLUnitTest = new CassandraCQLUnit(new MigrationTestCQLDataSet(CQL_INIT_TEST, false, false, KEYSPACE_NAME));

And the rule execute the same script put in post 8412, with now, an index creation like this

"CREATE SEARCH INDEX IF NOT EXISTS ON test";

So, when the @Rule is going to execute, the syntax is not recognized. If I remove it, JUnit test are working for those where the index is not necessary.

For the point 2),. It works now with the good syntax. Very good and thank you. Then, it will be nice to understand why under JUnit the syntax is not recognized. But going to try again in case of... and does solr_quey allows to make a search on 2 columns ?

===> YA : SELECT * FROM test WHERE solr_query='external_id: 1-external_id && company_id: 1-company_id';

So, the question is then, now, how do we implements the query with standard API of Cassandra ? to produce a such query ?

and finally for the point 3) the question is more we can use custom SAI index for Production as we read that SASI index is not supported. We are using 6.8.3 but my tests are on local docker version dse-server:5.1.17. For JUnit, this is the in-memory version of Cassandra. So what I m understanding is that SAI index are not SASI index. SAI is a new way to create custom Index. Please confirm.

dsesearch
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered Erick Ramirez edited

Hi @jean.andre_185656,

there are quite a few questions in this post, and I am not sure if you still need any clarification re point 1.


Re point 2: I am not so familiar with JUNIT, perhaps you can post a code snippet how you initialise the keyspace with the test, and I will see whether we can use the same method to create the search index.

About using the search index: You will need to modify your cql query to use a Solr query. DSE Search is based on Solr.

In your case, the cql query would need to look like this:

SELECT * FROM myspace.test WHERE solr_query='external_id: The-External-Id';

Here is the reference to the search syntax, for DSE 6.8:

https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/search_index/siQuerySyntax.html

DSE Search will make certain searches a lot easier. But please note that it also introduces operational complexity, especially when using at scale, as it is very IO intensive.


Point 3:

Which DSE version are you evaluating? With DSE 6.8.3 you have the option to create a custom SAI index. Note that you cannot use both (Search index and SAI index) on the same table.

See here for the overview documents:

https://docs.datastax.com/en/storage-attached-index/6.8/sai/saiTOC.html


I hope this helps to address your questions.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jean.andre_185656 avatar image jean.andre_185656 commented ·

Thank you very for your answer. So here are some additional info:

[Update posted in original question]

Thank you very much.

[Post converted to comment since it's not an "answer"]

0 Likes 0 ·
smadhavan avatar image smadhavan ♦ jean.andre_185656 commented ·
0 Likes 0 ·
Cedrick Lunven avatar image
Cedrick Lunven answered jean.andre_185656 commented

Hi,

You are using quite a lot of libraries on top of the Cassandra drivers, this does not always simplify things.

  1. As of now you seem to use Spring-Data 2.x. Starting with 3.x the Session object will not exist anymore. Consider to migrate. For example spring-boot won't be compatible with your stack for long.
  2. If you are using DSE you should stop using the OSS driver 3.10.2 but use the Datastax enterprise drivers 1.10.2 instead which is in the support license.
  3. In the example above you are using Cassandra UNIT, it is working with Cassandra but not the DSE specifics workloads like search, graph and spark. The solution is not to use Cassandra unit but TestContainers or any Docker-based solution with a DSE docker image.
  4. So what I m understanding is that SAI index are not SASI index. SAI is a new way to create custom Index. Please confirm.
    • Indeed. I confirm the SAI indexing will be available in the latest version of DSE as release this month.

Best regards.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jean.andre_185656 avatar image jean.andre_185656 commented ·

Thank you again. I understand but I'm a little bit confuse with many kind of library version. Currently the project is using the artifact from com.datastax.cassandra, and I saw there is also a new version from com.datastax.oss (4.4.0), and now, if I need DSE I should use the library 1.10.2 . Search on Internet shows that from: https://frontbackend.com/maven/artifact/org.testcontainers/cassandra/1.10.2 and it is using the 3.5.1 as driver version from com.datastax.cassandra. Then so far so good... ;)

<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>cassandra</artifactId>
    <version>1.10.2</version>
</dependency> 
0 Likes 0 ·
jean.andre_185656 avatar image jean.andre_185656 commented ·

So, for the moment, I'm going to put aside the test for DSE. Upgrading the libraries, is not in our roadmap for now but you are definitively right and agree but it is already planned for soon. And before using secondary index and DSE or any other approaches, I'm going to verify various point related to these kind of requests.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered jean.andre_185656 commented

To add to Bettina's and Cédric's answers:

1. It isn't necessary to concatenate company_id and external_id since you're already using DSE Search. Bettina's provided you with the Solr query syntax already.

2. Is it possible that you're running the CREATE SEARCH INDEX command on a node that is not running in DSE Search mode?

3. You won't be able to use SAI on DSE 5.1.18 since it's a new feature in DSE 6.8.3.

Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jean.andre_185656 avatar image jean.andre_185656 commented ·

I was not able to make solr Search working under JUnit despite of my efforts to understand the case. Probably also a problem of config ? So, for now, I made some test with Secondary index, then it takes to add an additional field and to compute the hash (MD5) and search on this dedicated column. It works, seems to be faster, however, I also need a result ordered by the creation date... As we impose a limit on returned records, I cannot figured out if the behaviour of the request will be ok with hundred of millions of records. We want to get the most recent record....but we do not provide the date... If Cassandra get records from each node, records should be sorted according to the clustering column ? Is it guarantee even if we do not use the partition key ? So I need to make more test to check the behaviour by sorting the clustering columns in various ways (ASC vs DESC). So what does it take to run a node in DSE mode ?

1 Like 1 ·