Hello,
I want to collect runtime metrics generated from IoT devices, and i want the database table to be not aware of the data schema.
Every device has between 40 and 100 metrics, and the sending period could be in the range of seconds, and I want to be able to query the database to retrieve the value of specific metric from a device in a certain time slot.
The purpose of the solution is to have a general-purpose timeseries database where i can collect metrics and plot graphs for monitoring and debug.
Every set of metrics is related to a device_id, so partitioning could be done by use of composite device_id and date.
Desired query-like behaviour
SELECT '$metric_id', ts FROM table WHERE id='$device_id' and date='2020-09-2020'
I have tried different approaches:
SOLUTION 1: use map with primitive type
PRO: one write per device
CONS: don't understand if possible to select single KVP or whole collection is returned?
CREATE TABLE test_keyspace.metrics_by_device_id_and_date ( id text, date text, ts timestamp, metrics_bool map<text, boolean>, metrics_decimal map<text, decimal>, metrics_text map<text, text>, PRIMARY KEY ((id, date), ts) ) WITH CLUSTERING ORDER BY (ts DESC)
Example query. Is it OK? Looks like column return is not object, but a string
SELECT 'metric_decimal['$(metric_id'], ts FROM table WHERE id='$device_id' and date='2020-09-2020'
SOLUTION 2: use table as a KVP store
PRO: it works... but
CONS: if device pushes 100 metrics/s, it means 100 writes/s to store data for single device. Probably ok from partitioning point of view (86.4k rows per metric per day). But i means billion writes on db per day (if 1000 devices)
CREATE TABLE test_keyspace.metrics_by_device_id_and_date ( id text, date text, ts timestamp, metric_id text, bool_value boolean, decimal_value decimal, text_value text, PRIMARY KEY ((id, date, metric_id), ts) ) WITH CLUSTERING ORDER BY (ts DESC)
SOLUTION 3: same as SOLUTION 1, but with UDT
PRO: cleaner compared to SOLUTION 1
CONS: same cons of SOLUTION 1, plus difficult to manage on with drivers (not ORM happy)
CREATE TYPE metric ( id text, text_value text, decimal_value decimal, bool_value boolean ); CREATE TABLE test_keyspace.metrics_by_device_id_and_date ( id text, date text, ts timestamp, metrics frozen<map<text, frozen<metric>>>, PRIMARY KEY ((id, date), ts) ) WITH CLUSTERING ORDER BY (ts DESC)
Do you have any advise on how i could address this problem? Is Cassandra still a good solution for this problem?
Thank you very much