DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

kevinmat avatar image
kevinmat asked ·

Is filtering on 1 column of a composite partition key not allowed?

Hi , I am using Scala version : 2.11 Spark Cassandra connector : 2.3.0

I have a table like :

CREATE TABLE dcmapp (
    t text,
    p text,
    v double,
    d blob,
    f text,
    i tinyint,
    PRIMARY KEY ((t, p), v )
)

The above contains a 2 column partition key. When I want to fetch rows based on column "t" which is part of partition key like :

sc.cassandraTable("kevin", "dcmapp").select("t", "p").where("t = ?", "A1").collect().foreach(println)

I get the following error:

java.lang.UnsupportedOperationException: Partition key predicate must include all partition key columns or partition key columns need to be indexed. Missing columns: p

If I query by i , the query is successful.

Is it not possible to query in connector using partial partition key.

If I trigger the same query via cqlsh specifying only column "t" in where clause and ALLOW FILTERING i get results

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

As the exception suggests, you need to specify all the columns of a composite partition key in the filter. This isn't unique to the connector -- you won't be able to do it either in cqlsh since it's not a valid CQL query.

When you don't specify the full partition key, Cassandra is not able to retrieve the records because the partitioner can only locate the partitions based on the hash value of the full partition key. I've explained this in question #5944 if you're interested in understanding it in more detail.

When you use the ALLOW FILTERING clause, Cassandra is no longer trying to retrieve a single partition. Instead it is performing a full table scan which means that this query doesn't scale when (a) the table is large, (b) the cluster has lots of nodes, or (c) both. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Erick for the explanation. Is it possible to provide only part of the partition key via the spark Cassandra connector and connector will suffix ALLOW FILTERING and execute the query

When I tried it using connector it gave the error which I mentioned

0 Likes 0 · ·

It isn't, no, since ALLOW FILTERING isn't recommended. Cheers!

0 Likes 0 · ·