question

MayuriD avatar image
MayuriD asked MayuriD commented

Cassandra Join where column name is in camel case returns "column not found"

Cassandra join with streaming data is not working when column name is stored in Cassandra as camelCase
Steps to reproduce :

1. create a Cassandra table with column name in camelCase

CREATE TABLE pl_event (
      "tntId" varcha
      "pId" varcha
      type varcha
      PRIMARY KEY ("tntId", "pId
) WITH comment='event mapping records';

2. Read streaming data from kafka

eg :entityDF

3. Read data from Cassandra table. eg :

 Dataset<Row> cassandraDf = spark.read()
    .format("org.apache.spark.sql.cassandra")
    .option("table", "pl_event")
    .option("keyspace", "test")
    .option("directJoinSetting","on")
    .load();

4. Try to join streaming DF with Cassandra read DF

entityDF = entityDF.join(cassandraDF,cassandraDF.col("tntId").equalTo(entityDF.col("tntId")).and(cassandraDF.col("pId").equalTo(entityDF.col("pId"))));

Error :

This fail with error saying "tntid" column is not found

Even we try to escape "tntId", it that case it says unable to find column ""tntId"" from table

Any suggestion or solution to handle direct join with column name stored in camelCase?

spark-cassandra-connector
5 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image smadhavan ♦ commented ·

@MayuriD , could you post the output of running cassandraDF.show ?

0 Likes 0 ·
MayuriD avatar image MayuriD smadhavan ♦ commented ·

@smadhavan Here is the output of cassandraDF.show

+---------------+-----+------------------+
|tntId       |pId| |timeout                                                              |
+---------------+-----+------------------+
|T1|1001 |10 min           
|T2|1002 |20 min
+---------------+-----+------------------+
0 Likes 0 ·
jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ MayuriD commented ·
Hi! Which version of the connector do you use? Could you paste the exact error you're getting (full stacktrace included)?
0 Likes 0 ·
Show more comments

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered MayuriD commented

No, thank you, the provided information is enough to reproduce the bug.

We should be using .asCql(true) here.

Would you be willing to contribute this change for branches b3.0 and newer?


4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

MayuriD avatar image MayuriD commented ·

@jaroslaw.grabowski_50515 Thank You for the answer!!!
yes , I am willing to contribute. Is there any guidelines, for the contribution?

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ MayuriD commented ·

Awesome! Have a look at the Contributing section here. Cheers!

0 Likes 0 ·
jaroslaw.grabowski_50515 avatar image jaroslaw.grabowski_50515 ♦ MayuriD commented ·

msmygit (https://datastax-oss.atlassian.net/browse/SPARKC-682) is already taking care of this one. Thank you!

0 Likes 0 ·
MayuriD avatar image MayuriD jaroslaw.grabowski_50515 ♦ commented ·

Okay! Waiting for PR to get merged. Cheers!!

0 Likes 0 ·