DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

y.khmelevskyi_193158 avatar image
y.khmelevskyi_193158 asked ·

How do I model messages for recipients so only the latest message from a sender are returned?

My example the following:

I have messages from recipients. I need to select 10 latest recipient with their latest messages.

For example I have the following inbox:
Travis- 01/04/2020
Zack- 01/03/2020
John - 01/02/2020

when John send new message this list should be

John - 01/05/2020
Travis- 01/04/2020
Zack- 01/03/2020

I don't want use delete operation. So, can you please help me to model data for this?

data modeling
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

If you used a map collection to store the message, you can use the sender's name as the key and the value will always be set to the date of their latest message like this:

CREATE TABLE messages_by_recipient (
    recipient text,
    senders map<text, date>,
    PRIMARY KEY (recipient)
)

The challenge with this data model is that the senders will be sorted based on name and not the date the message was sent.

If you need to retrieve results in reverse chronological order, you'll need to specify the date as a clustering key:

CREATE TABLE messages_by_recipient (
    recipient text,
    sent date,
    sender text
    PRIMARY KEY ((recipient), sent)
) WITH CLUSTERING ORDER BY (sent DESC)

This doesn't fully achieve your requirements because you'll end up with duplicate senders in the list since it isn't possible to enforce uniqueness without scanning through all the entries and removing the sender. This is an expensive choice since it requires a read-before-write.

Either way, you'll really need to rethink your use case as you might need to use other ways of retrieving the data such as using Solr or Spark queries to push complex predicates. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.