Recently, when I use SparkSql to manipulate Cassandra, I've got a little confusion.
Here is what I did:
First, I created two cassandra table through SparkSql:
- create table cassandra.testks.testtab1(colA int, colB text) using cassandra partitioned by(colA)
- create table cassandra.testks.testtab2(colA int, colB text) using cassandra partitioned by(colA)
Then, I tried to INSERT INTO testtab1 by "insert into testks.testtab1(colA, colB) values(1, 'a')", which throws an Exception "missing primary key columns: [colA]". However, the following SQL works fine :
- insert into testks.testtab1(colA, colB) select colA, colB from testks.testtab2
I found that in CassandraWriteBuilder(https://github.com/datastax/spark-cassandra-connector/blob/master/connector/src/main/scala/com/datastax/spark/connector/datasource/CassandraWriteBuilder.scala), there exists column comparison between "primaryKeyColumn" and "inputColumns".
I'm not sure whether it's a bug or just my wrong usage.
Environment: Spark 3.1.2、spark-cassandra-connector_2.12-3.1.0.
Thanks in advance!