question

initialv avatar image
initialv asked Erick Ramirez answered

How can I create a new table from a Dataframe?

In the python docs I'm seeing ways to save a dataframe to an existing table, but I'd like to insert the dataframe values to a new one. Below is the snippet found in the doc for saving to an existing table.

df.write\
  .format("org.apache.spark.sql.cassandra")\ 
  .mode('append')\
   # Is there an option for a new table? 
  .options(table="kv", keyspace="test")\ 
  .save()

How might I do something similar to the above snippet but to create a new table from the Dataframe `df`?

spark-cassandra-connectorpyspark
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

steve.lacerda avatar image
steve.lacerda answered

There's no way to dynamically create a table based on a dataframe, so you would have to create the table separately and then save the data to the table. Here's a similar question that's been asked on StackOverflow:

https://stackoverflow.com/questions/48396460/create-cassandra-table-from-pyspark-dataframe

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered

The DataFrames class does not have a method for performing (data definition language) DDL operations so it isn't possible to create new table from the DataFrame object.

However, Spark 3.0 has an API for Catalogs which provides a mechanism for DDL operations against the underlying data source (see Catalogs in Datasets for details).

For example, you can setup a catalog reference in SparkSQL to your Cassandra cluster with:

spark.conf.set(s"spark.sql.catalog.mycatalog", "com.datastax.spark.connector.datasource.CassandraCatalog")

This example creates a table myks.mytable in Cassandra:

spark.sql("CREATE TABLE mycatalog.myks.mytable (key Int, value STRING) USING cassandra PARTITIONED BY (key)")

For more info, see the Spark Cassandra connector Quickstart Guide. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.