question

nirjharray avatar image
nirjharray asked Cedrick Lunven commented

How do I handle a 2-phase commit since Cassandra does not support ACID?

Since Cassandra does not have transactions and does not support ACID how to architect a solution supporting 2-phase commit to maintain consistency across 2 data sources where one data source is a RDBMS and the other is Cassandra. For example if my use case is to capture relational data in web form and capture the comments against those relational records in Cassandra. I want to ensure the comments are captured only when the relational record gets committed else rollback

cassandralightweight transactions
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

In my personal opinion, ACID operations are an illusion. Once you understand what is really under the hood between ACID in RDBMS vs Cassandra's eventual consistency, you'll realise it too. :)

You haven't provided a lot of information. The basic schema for the relational records table plus comments table and the insert queries you had in mind is important to be able to provide a meaningful response.

In any case, it sounds like what you're trying to do is a compare-and-set (CAS) operation -- check IF the relational record EXISTS before inserting the comment. In Cassandra, this can be achieved with lightweight transactions (LWT).

When performing an INSERT or UPDATE, C* supports the use of the IF clause. This allows you to perform conditional writes.

For example, only create a new record if it doesn't already exist:

INSERT INTO table_name ( ... ) VALUES ( ... ) IF NOT EXISTS

Or perform a conditional update:

UPDATE table_name SET col = ? WHERE pk = ? IF col = ?

For details, here are some references:

Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered Cedrick Lunven commented

Hi,

First part of the question is you not easy as you gave the name of an implementation - 2 phase-commits (also called XA) and NOT the requirements.


Then you explain what you need

My use case is to capture relational data in web form and capture the comments against those relational records in Cassandra. I want to ensure the comments are captured only when the relational record gets committed else rollback


As far as i understand:

  1. You insert into a RDMS
  2. Then, if success, you insert into Cassandra.


For this use case I see 2 approach:

  • Do one insertion after the other and only if the first insertion is OK. The insertion in Cassandra is NOT very likely to fail as the driver will retry for you multiple times and on multiples nodes.
  • Use a CDC like debezium https://debezium.io/documentation/reference/index.html
3 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

nirjharray avatar image nirjharray commented ·

isn't this approach only considering the happy path scenario. What if the insert to Cassandra fails? Shall we delete the records inserted in RDBMS in that case? This approach will fork lot of logic development depending on the possible permutation & combinations across the 2 data sources

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ nirjharray commented ·

It sounds like you're expecting a C* solution for what should be handled at the app layer. You wouldn't expect your relation DB to be able to do that, would you?

You need to handle the exceptions application-side and apply whatever business rules there. Cheers!

0 Likes 0 ·
Cedrick Lunven avatar image Cedrick Lunven ♦ nirjharray commented ·

The statement "What if the insert to Cassandra fails? " is really a biais / RBDMS way of thinking.

I would challenge with a what could make a Cassandra insert fail ?

  • There is no real before-write
  • There is no integrity constraint
  • There is no query plan
  • The grammar of the query is checked at application startup with PreparedStatements
  • If a node is not available the driver will reconnect, load-balance and retry your queries.


But yeah if the table got erase or you loose all the replica holding the (then you have bigger problems), the you need to rollback at application level, reason why I propose to go with RDBMS first it slower and will never override Cassandra and you can apply the commit of RDBMS after Cassandra insert:

Insert into RDBMS

Insert into Cassandra

Commit into RBDMS.

0 Likes 0 ·