question

MayuriD avatar image
MayuriD asked MayuriD commented

Row not getting deleted after setting TTL

We have , spark.cassandra.output.ignoreNulls = true , to avoid the replacement of null with actual value in case of update/modify scenario.

Now the problem we are facing is -
1. we added the ttl_col with value 300 for some rows
2. Rows are having some column null
3. Here we can see TTL(col_name) is not returning the time for all non primary key columns..(which is null in 2)
4. As a result Row is not getting deleted, after 300 sec some columns are becoming null.

Example,

CREATE TABLE test.emp (
    id varchar,
    policy varchar,
    emp varchar,
    ttl_sec bigint,
    PRIMARY KEY (id)
) WITH comment='Emp records';

2. Insert Records

INSERT INTO test.emp (id, policy,emp, ttl_sec)
VALUES ('100', 'RATTO', 'Rissella',0) ;
INSERT INTO test.emp (id, policy,emp, ttl_sec)
VALUES ('101', 'RATTO1', 'Rissella1',0)
select * from test.emp;
  id | emp       | policy | ttl_sec
-----+-----------+--------+---------
 100 | Rissella  | RATTO  | null
 101 | Rissella1 | RATTO1 | null

3. Update record with ttl_sec and emp is null and spark.cassandra.output.ignoreNulls = true

INSERT INTO test.emp (id, policy,emp, ttl_sec)
VALUES ('100', 'RATTO', null,300) ;

This is the output---

select * from test.emp;
  id | emp       | policy | ttl_sec
-----+-----------+--------+---------
 100 | Rissella  | RATTO  | null
 101 | Rissella1 | RATTO1 | null
select TTL(emp) from test.emp;
TTL(emp)
-------
null
null

TTL not showing here

select TTL(policy) from test.emp;
TTL(policy)
-------
250
null

We observed, after 300 sec policy column becoming null.
Can someone help here , how can we apply ttl to complete Row, as we want , complete row should be deleted after TTL . Is there any property setting which we are missing?

spark-cassandra-connector
2 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ commented ·

Hi!

What's the connector version?

Could you paste your connector code ?

Thanks!

0 Likes 0 ·
MayuriD avatar image MayuriD jaroslaw.grabowski_50515 ♦ commented ·

Hi.. @jaroslaw.grabowski_50515

we are using - spark-cassandra-connector_2.12:3.2.0

Code -

spark.sparkContext().conf().set("spark.cassandra.connection.host",host);
spark.sparkContext().conf().set("spark.cassandra.connection.port", port);
spark.sparkContext().conf().set("spark.cassandra.output.ignoreNulls", "true");

DF.write()
.format("org.apache.spark.sql.cassandra")
.option("keyspace", "test")
.option("table", tableName)
.option("ttl", "ttl_sec") //Use the values in ttl_sec col as the TTL
.mode(SaveMode.Append)
.save();
0 Likes 0 ·

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered

It's hard to judge by the small part you pasted but I'm guessing you're updating ttl for the row on every insert.

- Initial insert sets TTL for all the columns

- Second insert sets fresh TTL (the counter starts ticking from start) for all the columns except the one ignored due to ignoreNulls setting

- TTL elapses for the ignored column, the value for this column disappears, the rest of the columns remain since the timer was restarted

- if you wait long enough the other columns should disappear too

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.