question

manzoor128_186273 avatar image
manzoor128_186273 asked Erick Ramirez edited

How do I retrieve the first record of every minute from a timeseries table with PK (deviceId, datetime) ?

I am new to cassandra and have query reg data model.

Can someone tell me how to write select query to return first record of each minute ? First record here refers to first record of each minute (based on seconds). Please see example below. My System is going to regulary receive data for every 5 seconds from device. My Table structure is going to be like this.

CREATE TABLE device (
    deviceId text,
    datetime timestamp,
    temp float,
    volt float,
    PRIMARY KEY (deviceId, datetime)
) WITH CLUSTERING ORDER BY (datetime DESC);

Is it possible to extract the datetime in select query , process it and find the first record of each minute?

insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:10:00+0000',0.12,6.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:10:05+0000',3.12,61.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:10:10+0000',2.12,16.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:10:15+0000',1.12,26.7);

insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:11:20+0000',4.12,11.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:11:25+0000',5.12,12.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:11:30+0000',6.12,23.7);
insert into device (deviceId,datetime,temp,volt)
values ('device123','2018-10-10 5:11:35+0000',7.12,126.7);

My Query should give me the result like this:

  • For 10th minute, second=00 temp=0.12, volt = 6.7
  • For 11th minute,second=20 temp=4.12, volt = 11.7

Can you provide suggestions as to how I can achive this in cassandra?

data modeling
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@manzoor128_186273 If you think about how you're storing the data in your table, for each device you are storing:

  • 12 rows/min
  • 12 x 60 = 720 rows/hr
  • 720 x 24 = 17,280 rows/day

That's over 500K rows in just 1 month. That's a lot of rows to scan across to retrieve the data you need. What it means is that your table design doesn't match your query. What you need to do is design your table based on the app query you want to run.

This partitioning scheme isn't what you want:

    PRIMARY KEY (deviceId, datetime)

You need to add more "buckets" to slice up your partitions into smaller "chunks". Let me give you an example -- and to be clear, this is just an example.

If you partition your data based on the device and day but then have buckets for "hours" and "minutes" like this:

    PRIMARY KEY ( (deviceid, day), hr, min )

This query will return all the readings for the day:

SELECT temp, volt FROM device \
  WHERE deviceid = ? AND day = ?

You can limit the rows returned if you specify just the hour, say 7pm:

SELECT temp, volt FROM device \
  WHERE deviceid = ? AND day = ? \
  AND hr = 19

If you sort the minutes in ascending order, the first row will be the first record of that minute so you just need to return the first row. Say you want to get the first reading for 7:25pm:

SELECT temp, volt FROM device \
  WHERE deviceid = ? AND day = ? \
  AND hr = 19 AND min = 25 LIMIT 1

Hopefully that gives you an idea of how you should design your table. You will need to add more "buckets" depending on your application queries. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.