Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Ryan Quey avatar image
Ryan Quey asked ·

Using UDTs with java driver mapper: How to implement DAO and codecs and best practices

Summary

I have two Java classes, each of which has a corresponding table and UDT in Cassandra. For these two classes, I want to be able to access data from the other class in a single query, so I have two corresponding tables and each of these tables has a column referencing the other class's UDT. How do I implement this using the Java driver?


DSE v.6.8

Java driver v.4.6


This seems to me like it should be a fairly common use case, but I'm having a hard time finding a solution online, at least for Java Driver v.4.6. Or it is just that I'm thinking about this totally wrong, in which case please let me know the best practice for a solution to this.


Either way, I'm providing more details below in case it provides clarity regarding what my question is.


My Objective

I have two java classes (Podcast and Episode). For these two classes, I want to be able to access data from the other class in a single query. Podcasts have many episodes, and episodes has a single podcast.


For example, it looks something like:

USE podcast_analysis_tool;

CREATE TABLE podcasts_by_language (
    language text,
    primary_genre text,
    feed_url text,
    author text,
    episodes list<frozen<episode>>,
    PRIMARY KEY (language, primary_genre, feed_url)
);

CREATE TYPE podcast (
    language text,
    primary_genre text,
    feed_url text,
    author text,
    episodes list<frozen<episode>>
);


CREATE TABLE episodes_by_order_in_podcast (
    podcast_api text,
    podcast_api_id text,
    order_num int,
    keywords set<text>,
    podcast frozen<podcast>,
    PRIMARY KEY ((podcast_api, podcast_api_id), order_num)
);

CREATE TYPE episode (
    podcast_api text,
    podcast_api_id text,
    order_num int,
    keywords set<text>,
    podcast frozen<podcast>
); 


I am using the Java driver mapper and I think I have it mostly setup, except for in regard to these UDTs. When I try to instantiate the dao, I get the following error:


java.lang.IllegalArgumentException: The CQL ks.table: podcast_analysis_tool.podcasts_by_language defined in the entity class: dataClasses.Podcast declares type mappings that
are not supported by the codec registry:
Field: episodes, Entity Type: dataClasses.Episode, CQL type: UDT(podcast_analysis_tool.episode)
        at com.datastax.oss.driver.internal.mapper.entity.EntityHelperBase.throwMissingTypesIfNotEmpty(EntityHelperBase.java:196)
        at com.datastax.oss.driver.internal.mapper.entity.EntityHelperBase.throwMissingTableTypesIfNotEmpty(EntityHelperBase.java:185)
        at dataClasses.PodcastHelper__MapperGenerated.validateEntityFields(PodcastHelper__MapperGenerated.java:517)
        at cassandraHelpers.PodcastDaoImpl__MapperGenerated.initAsync(PodcastDaoImpl__MapperGenerated.java:92)
        at cassandraHelpers.PodcastDaoImpl__MapperGenerated.init(PodcastDaoImpl__MapperGenerated.java:132)
        at cassandraHelpers.InventoryMapperImpl__MapperGenerated.lambda$1(InventoryMapperImpl__MapperGenerated.java:39)
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
        at cassandraHelpers.InventoryMapperImpl__MapperGenerated.podcastDao(InventoryMapperImpl__MapperGenerated.java:39)
        at dataClasses.Podcast.getDao(Podcast.java:125)
        at Main.processOnePodcast(Main.java:204)
        at Main.main(Main.java:22)


Potential Solutions

1) Set up each class as both a table and a UDT

The way it is currently, my Podcast class and Episode class are both labeled as tables only. I.e., for Episodes:

 @Entity
 @CqlName("episodes_by_order_in_podcast")
 public class Episode {
...


The same fields exist on the table that exist on the UDT. This being the case, can I add another annotation to the same class, so it is recognized as both a table and a type? It seems like this is a potential solution, when just reading the docs on dao schema validation. Maybe something like:


@SchemaHint(targetElement = UDT)
@SchemaHint(targetElement = TABLE)
public class Episode {
   
...


Seems a little messy, but is this possible?


2) Separate java classes for the UDT and the table

If each class can only be either a table or a UDT, should I just have one base class and then maybe two child classes, one for the table and one for the UDT?


3) Codec

Or, could I just solve this whole thing by leaving my current class alone, but adding a custom codec to map my Episode/Podcast classes to a UDT?


If so, how do I do this? Do I need to write out a full custom codec for each one? I found this solution that seems quite simple and intuitive, but it refers to a class MappingManager that does not seem to exist in 4.6 as far as I can tell.


Java driver v.2.1 seems like it had a very straightforward way of inferring the UDT from the class automatically, but I'm having a hard time finding out how to do this in v.4.6.

java drivercodecdao
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

alexandre.dutra avatar image
alexandre.dutra answered ·

Sorry but I'm not following. I think that your question has to do with circular references in UDTs, but:

1. Your schema is invalid:

cqlsh:test> CREATE TABLE podcasts_by_language (
        ...     language text,
        ...     primary_genre text,
        ...     feed_url text,
        ...     author text,
        ...     episodes list<frozen<episode>>,
        ...     PRIMARY KEY (language, primary_genre, feed_url)
        ... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type test.episode"
cqlsh:test>
cqlsh:test> CREATE TYPE podcast (
        ...     language text,
        ...     primary_genre text,
        ...     feed_url text,
        ...     author text,
        ...     episodes list<frozen<episode>>
        ... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type test.episode"
cqlsh:test>
cqlsh:test>
cqlsh:test> CREATE TABLE episodes_by_order_in_podcast (
        ...     podcast_api text,
        ...     podcast_api_id text,
        ...     order_num int,
        ...     keywords set<text>,
        ...     podcast frozen<podcast>,
        ...     PRIMARY KEY ((podcast_api, podcast_api_id), order_num)
        ... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type test.podcast"
cqlsh:test>
cqlsh:test> CREATE TYPE episode (
        ...     podcast_api text,
        ...     podcast_api_id text,
        ...     order_num int,
        ...     keywords set<text>,
        ...     podcast frozen<podcast>
        ... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type test.podcast"

2. Why are you creating tables and UDTs with similar structure? What is your intent?

3. You didn't show your Java classes, so I can't know if they are correctly specified.

Would you mind reformulating your question and share with us your exact CQL schema and your exact Java classes? Thanks!

Just in case the mapper docs for driver 4.6 are here.

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Here's all my code if you want to look (please excuse the mess...this is my first go at Java much less Cassandra). For my current schema, I ran `describe schema` in cqlsh and printed it into a file here.

But yes, I think you're getting at what I'm wondering already, which is that I'm not thinking about this in a Cassandra way.

So, my use case would be that sometimes, I want to be able to grab podcasts and get all the episodes on that podcast. I can't do any joins of course, so I want to make episodes a UDT on the podcast table (podcast -> episodes).

0 Likes 0 · ·

Other times, I want to be able to get an episode directly, and then get information about that podcast from the episode (episode -> podcast). And then even beyond that, (and I didn't mention this part above...trying unsuccessfully to not make this question too involved) there will be time when I want to run a podcast search, hitting an external api, and then storing the results from that. I will want to be able to work backwards and recall all that podcasts and all their episodes retrieved by that one search query (search_query > podcasts > episodes). This would be a third table that I haven't yet added, but trying to prepare for by adding those UDTs like that.

This is why I was thinking of nesting the UDTs. How can I retrieve information like this without creating tables and UDTs with a similar structure? Should I just split things off into multiple tables and classes? (Trying to strike a balance between DRYing up code and not piling too much into a single class).

0 Likes 0 · ·

In this case I think you need to take a step back and rethink your data model.

For example, using nested collections to model a one-to-many relationship, as you are doing, is usually not recommended as it does not scale out well. Instead, you need to come up with a few denormalized tables that reflect your query patterns (how your data is going to be queried).

You seem to have started thinking about query patterns already, now you need to design one table for each query pattern. For example:

  • "I want to be able to grab podcasts and get all the episodes on that podcast"-> define a table where the podcast id is the partition key and the episode id is the clustering column. Then you could store additional podcast information in static columns (denormalization).

There are a few good learning resources that you could try:

Good luck!

2 Likes 2 · ·