Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

rjain avatar image
rjain asked ·

Will a query on a non-clustering column return accurate results?

Hi , I am running Spark on k8s and executing a Query on Cassandra below

CassandraJavaUtil.javaFunctions(jc).cassandraTable("keyspace", "table").select("key").where("column1 = ? AND value = ?", "colname", "colvalue"); which generates internally following query

'SELECT "key" FROM keyspace.table WHERE token("key") > ? AND token("key") <= ? AND column1 = ? AND value = ? ALLOW FILTERING

Schema is:

key text

column1 text,

value text,

PRIMARY KEY (key,column1)

My Question is: As shown in query , there are two columns , one is column1 which is clustering key and other is value , which is simple column. Can I put a query on non clustering key column "value" on a highly loaded database. Will it return me accurate result?

spark-cassandra-connector
1 comment
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi updateding my comment below

0 Likes 0 ·

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered ·

It's not advisable to issue queries like this. Especially on a highly loaded database. This is essentially a full table scan with Cassandra-side filtering applied. Most likely the query will timeout.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi Jaroslaw,

Problem is i have two tables to which i want to join on some common clm col2. And both tables joining on col2 below placed in different partition shown below.

table1( col1 primary key, col2 text)

table2(col2 primary key, col3 text)

input provided : col1

Not using repartitionByCassandraReplica for aliginig Spark Partitions to Cassandra before joining as running Spark on K8s

On joining two tables on fully loaded database as below:

CassandraJavaUtil.javafunctions(rdd).joinWithCassandraTable(ks,tbl2,CassandraJavaUtil.allColumns,CassandraJavaUtils.someColumn("col2")

spark cassandra connector error

WARN ChannelPool: Cassandra-podname.namespace.svc.cluster.local/ip:port. Error while opening new Channel (DriverTimeoutException: Protocol initialization request step 1(STARTUP {CQL_VERSION=3.0.0. DRIVER_NAME=Datastax java driver for Apache Cassandra, DRIVER_VERSION=4.7.2, Client_ID=, APPLICATION_NAME=Spark-Cassandra-Connector-spark-application} timed out after 5 sec

Any suggestions?

0 Likes 0 ·