kevinmat avatar image
kevinmat asked Erick Ramirez commented

Is filtering on 1 column of a composite partition key not allowed?

Hi , I am using Scala version : 2.11 Spark Cassandra connector : 2.3.0

I have a table like :

    t text,
    p text,
    v double,
    d blob,
    f text,
    i tinyint,
    PRIMARY KEY ((t, p), v )

The above contains a 2 column partition key. When I want to fetch rows based on column "t" which is part of partition key like :

sc.cassandraTable("kevin", "dcmapp").select("t", "p").where("t = ?", "A1").collect().foreach(println)

I get the following error:

java.lang.UnsupportedOperationException: Partition key predicate must include all partition key columns or partition key columns need to be indexed. Missing columns: p

If I query by i , the query is successful.

Is it not possible to query in connector using partial partition key.

If I trigger the same query via cqlsh specifying only column "t" in where clause and ALLOW FILTERING i get results

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

As the exception suggests, you need to specify all the columns of a composite partition key in the filter. This isn't unique to the connector -- you won't be able to do it either in cqlsh since it's not a valid CQL query.

When you don't specify the full partition key, Cassandra is not able to retrieve the records because the partitioner can only locate the partitions based on the hash value of the full partition key. I've explained this in question #5944 if you're interested in understanding it in more detail.

When you use the ALLOW FILTERING clause, Cassandra is no longer trying to retrieve a single partition. Instead it is performing a full table scan which means that this query doesn't scale when (a) the table is large, (b) the cluster has lots of nodes, or (c) both. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

kevinmat avatar image kevinmat commented ·

Thanks Erick for the explanation. Is it possible to provide only part of the partition key via the spark Cassandra connector and connector will suffix ALLOW FILTERING and execute the query

When I tried it using connector it gave the error which I mentioned

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ kevinmat commented ·

It isn't, no, since ALLOW FILTERING isn't recommended. Cheers!

0 Likes 0 ·