Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

jynx avatar image
jynx asked ·

How do I define clustering columns when creating a table in Spark SQL?

Hi! When I run

spark.sql("""CREATE TABLE catalog.keyspace.table1 (userid String,something1 String, something2 String) USING cassandra PARTITIONED BY (userid, something1)""")

I get a table with compound primary key from columns userid and something1. Is there a way to specify that I don't want something1 to be a part of the compound primary key, but a clustering column?

In CQL instead of:

CREATE TABLE keyspace.table1 (
    userid text,
    something1 text,
    something2 text,
    PRIMARY KEY (userid, something1)
)

I am getting

CREATE TABLE keyspace.table1 (
    userid text,
    something1 text,
    something2 text,
    PRIMARY KEY ( (userid, something1) )
)

CLUSTERED BY clause requires INTO num BUCKETS part, which I don't know if even relates to cassandra.

Thank you!

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered ·

Hi!

This is how you could define a clustering key:

spark.sql(s"CREATE TABLE myKeyspace.myTable (key Int, value STRING) USING cassandra PARTITIONED BY (key) TBLPROPERTIES (clustering_key='value.asc')")

Cheers

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you!

0 Likes 0 · ·