Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

jtdelato avatar image
jtdelato asked ·

How do I model a table of albums where there could be multiple artists?

I am currently taking the Developer Path courses and am planning to use Astra to create a practice cluster.

My fictional app using my cluster will be an app allowing users to catalog and track their personal vinyl record collections. My current data model for a basic albums_by_artist table is below. It is intended to support the query: Show all albums by a particular artist.

user text K

artist text K

year int C ↓

title text C ↑

genre text

label text

country text

My current question is how to best deal with an album that has multiple artists, such as a compilation or collaboration album?

If I make the artist column a set collection, I could store all the artists for an album in a single column. However, the way I understand it, a collection may only be in the primary key if it is frozen, which I believe means that I would need to query by the entire collection column's values.

If so, then querying to retrieve all albums an artist appears on would seem to require the user to know in advance all the different combinations of artist names that may be listed on an album. For example, a search for

 WHERE artist = {'Gwen McCrae'}

would not return the album Together by George and Gwen McCrae; and would require a search for

WHERE artist = {'George McCrae', 'Gwen McCrae'}

My current solution to this problem is to leave the artist column as a text type and simply create a duplicate row for each album in each artist's respective partition.

For example, I would have a row for the album Together by George and Gwen McCrae in both Gwen McCrae's partition and George McCrae's partition, so that the album will be returned when searching for either artist.

Creating duplicate rows could be tedious, time-consuming and error-prone, but seems to be the best solution for showing a user all the albums that a particular artist has appeared on while also avoiding a multi-partition query.

Is there a better solution anyone can think of?

In real-world production, would this be a good use case for something like DataStax Enterprise Search?

Thanks so much for any advice you can give!

data modeling
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

I think you are over-thinking it and making it seem more complicated than it is.

If you want to query all the albums of an artist, you need to use the artist as the partition key:

CREATE TABLE albums_by_artist (
  artist text,
  title text,
  ...
  PRIMARY KEY (artist, title)
)

In your example, you can list all the albums by George with:

SELECT title FROM albums_by_artist WHERE artist = 'George McRae';

It doesn't matter that you have duplicate entries for the album "Together" because that what your app requires. Duplication is irrelevant -- you model based on the requirements. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you Erick!

You are definitely right about me overthinking :)

Glad to know that my alternative solution with collections was just as convoluted as it sounded haha.

Really appreciate your help!

0 Likes 0 ·