Hi ,
I referred https://www.datastax.com/blog/2015/06/deep-look-cql-where-clause I understood that The role of clustering columns is to cluster data within a partition.
and in order to retrieve data in an efficient way without a secondary index, you need to know all the clustering key columns for your selection . But in the further article I read ,
Single column slice restrictions are allowed only on the last clustering column being restricted for <=,>=,<,> operators that means query with not all clustering key columns is valid . Does that give you efficient result?
CREATE TABLE numberOfRequests ( cluster text, date text, datacenter text, hour int, minute int, numberOfRequests int, PRIMARY KEY ((cluster, date), datacenter, hour, minute) )
SELECT * FROM numberOfRequests WHERE cluster = ‘cluster1’ AND date = ‘2015-06-05’ AND datacenter = 'US_WEST_COAST' AND hour = 14 AND minute = 00;
Cassandra will find the data efficiently but if you execute:
SELECT * FROM numberOfRequests WHERE cluster = ‘cluster1’ AND date = ‘2015-06-05’ AND hour = 14 AND minute = 0;
Cassandra will reject the query as it has to scan the entire partition to find the requested data, which is inefficient.
and on further I saw this ,
>, >=, <= and < restrictions
Single column slice restrictions are allowed only on the last clustering column being restricted. Therefore, the following queries are valid:
SELECT * FROM numberOfRequests WHERE cluster = ‘cluster1’ AND date = ‘2015-06-05’ AND datacenter = 'US_WEST_COAST' AND hour= 12 AND minute >= 0 AND minute <= 30; SELECT * FROM numberOfRequests WHERE cluster = ‘cluster1’ AND date = ‘2015-06-05’ AND datacenter = 'US_WEST_COAST' AND hour >= 12; SELECT * FROM numberOfRequests WHERE cluster = ‘cluster1’ AND date = ‘2015-06-05’ AND datacenter > 'US';
but how are they valid with respect to the topmost explaination . if valid , they will be inefficient right ?