sungop_187062 avatar image
sungop_187062 asked Erick Ramirez answered

Why is WriteTimeoutException happening?

We have an existing table:

    org_id decimal,
    type text,
    subtype text,
    valid_from timestamp,
    effective_time timestamp,
    valid_to timestamp,
    id text,
    PRIMARY KEY ((org_id, type, subtype), valid_from, effective_time, valid_to, id)

Updated the table to apply user level ownership of each row and created an index to query on user.

ALTER TABLE alert_logs
    ADD USERS set<decimal>;

We have a heavy writing application. 100s of writes from multiple threads at the same time into the same partition. 10% of insert queries are failing with the following error:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write)

Each row user's set may have 200 values. these combinations repeats for each row.

Why write timeouts are happening?

Is the secondary index a good choice here?


secondary index
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

starlord avatar image
starlord answered sungop_187062 commented

What version of Cassandra or DSE are you currently running?
Adding a secondary index can definitely hurt performance, and it isn't uncommon to have to tune the write path and increase your write timeout to avoid failure. If you are on a version that supports SAI, that would probably be a better option for you:

If that isn't an option, I'd want to see a debug.log where the problem is happening as well as your cassandra.yaml file.

3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

sungop_187062 avatar image sungop_187062 commented ·
Thanks for the response. We are on DSE 6.8.18.

I did try with the Storage attached index. It is working fine. The read query is a bit slow, but it can be acceptable.

0 Likes 0 ·
sungop_187062 avatar image sungop_187062 commented ·

After adding SAI, the read on the 30000 records partition is taking 8s to 2s. Is there any optimization that can be done to improve the performance?

0 Likes 0 ·
starlord avatar image starlord ♦ sungop_187062 commented ·

Reducing the partition size is definitely advisable if it's growing large, which of course requires a table definition change, but if you're pausing due to filling the heap, you might try increasing the heap size if you are lower than 31G. If already at or above 31G then what compaction strategy do you use? and driver settings? Is it possible for you to perform the query in cqlsh with tracing enabled to get a better idea of what's happening? Might be some knobs to turn, but could also be a scenario that requires a partition key change to solve, we'll need some additional info though to make that call.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

The WriteTimeoutException is a result of nodes not responding to the coordinator because they are overloaded with writes.

You need to review the utilisation of the commitlog/ disk with Linux tools like iostat to see if you are saturating the IO bandwidth of the disks. You may need to review the size of your cluster and consider increasing the capacity by adding more nodes.

We recommend using NVMe SSDs for optimal read and write performance. If it's not an option then we suggest you mount the data/ and commitlog/ directories on separate volumes/disks so reads are not competing for the same IO bandwidth as writes.

If you haven't already seen them, please have a look at the capacity planning and recommended production settings documents for guidance. If you have any follow up questions, your best course of action is to log a ticket with DataStax Support so one of our engineers can assist you. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.