rjain avatar image
rjain asked rjain commented

Will a query on a non-clustering column return accurate results?

Hi , I am running Spark on k8s and executing a Query on Cassandra below

CassandraJavaUtil.javaFunctions(jc).cassandraTable("keyspace", "table").select("key").where("column1 = ? AND value = ?", "colname", "colvalue"); which generates internally following query

'SELECT "key" FROM keyspace.table WHERE token("key") > ? AND token("key") <= ? AND column1 = ? AND value = ? ALLOW FILTERING

Schema is:

key text

column1 text,

value text,

PRIMARY KEY (key,column1)

My Question is: As shown in query , there are two columns , one is column1 which is clustering key and other is value , which is simple column. Can I put a query on non clustering key column "value" on a highly loaded database. Will it return me accurate result?

1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi updateding my comment below

0 Likes 0 ·

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered rjain commented

It's not advisable to issue queries like this. Especially on a highly loaded database. This is essentially a full table scan with Cassandra-side filtering applied. Most likely the query will timeout.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi Jaroslaw,

Problem is i have two tables to which i want to join on some common clm col2. And both tables joining on col2 below placed in different partition shown below.

table1( col1 primary key, col2 text)

table2(col2 primary key, col3 text)

input provided : col1

Not using repartitionByCassandraReplica for aliginig Spark Partitions to Cassandra before joining as running Spark on K8s

On joining two tables on fully loaded database as below:


spark cassandra connector error

WARN ChannelPool: Cassandra-podname.namespace.svc.cluster.local/ip:port. Error while opening new Channel (DriverTimeoutException: Protocol initialization request step 1(STARTUP {CQL_VERSION=3.0.0. DRIVER_NAME=Datastax java driver for Apache Cassandra, DRIVER_VERSION=4.7.2, Client_ID=, APPLICATION_NAME=Spark-Cassandra-Connector-spark-application} timed out after 5 sec

Any suggestions?

0 Likes 0 ·