DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

jynx avatar image
jynx asked ·

How do I define clustering columns when creating a table in Spark SQL?

Hi! When I run

spark.sql("""CREATE TABLE catalog.keyspace.table1 (userid String,something1 String, something2 String) USING cassandra PARTITIONED BY (userid, something1)""")

I get a table with compound primary key from columns userid and something1. Is there a way to specify that I don't want something1 to be a part of the compound primary key, but a clustering column?

In CQL instead of:

CREATE TABLE keyspace.table1 (
    userid text,
    something1 text,
    something2 text,
    PRIMARY KEY (userid, something1)
)

I am getting

CREATE TABLE keyspace.table1 (
    userid text,
    something1 text,
    something2 text,
    PRIMARY KEY ( (userid, something1) )
)

CLUSTERED BY clause requires INTO num BUCKETS part, which I don't know if even relates to cassandra.

Thank you!

spark-cassandra-connector
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

jaroslaw.grabowski_50515 avatar image
jaroslaw.grabowski_50515 answered ·

Hi!

This is how you could define a clustering key:

spark.sql(s"CREATE TABLE myKeyspace.myTable (key Int, value STRING) USING cassandra PARTITIONED BY (key) TBLPROPERTIES (clustering_key='value.asc')")

Cheers

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you!

0 Likes 0 · ·