Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

lchu122 avatar image
lchu122 asked Erick Ramirez answered

"Couldn't find system_schema or any similarly named keyspaces" in PySpark

System_Schema not found using Datastax Spark Cassandra Connector (Cassandra 3.11.2, Spark 3.0.1)

The idea is to query system_schema.tables to dynamically find all tables within a cassandra cluster using datasources V1.

# query keyspaces
print("\n***Querying keyspaces in Cassandra***")
cassandra_options['keyspace'] = 'system_schema'
cassandra_options['table'] = 'keyspaces'
df_keyspaces = spark.read.format("org.apache.spark.sql.cassandra")\
                         .options(**cassandra_options)\
                         .load()

df_keyspaces.show()

# keyspaces = df_keyspaces.select("keyspace_name").rdd.flatMap(lambda x: x).collect()

# query tables
print("\n***Querying tables in Cassandra***")
cassandra_options['keyspace'] = 'system_schema'
cassandra_options['table'] = 'tables'
df_tables = spark.read.format("org.apache.spark.sql.cassandra")\
                      .options(**cassandra_options)\
                      .load()

df_tables.persist()
df_tables.show()

However, when I try to do so, I get the following error:

pyspark.sql.utils.AnalysisException: Couldn't find system_schema or any similarly named keyspaces;

I login to the cqlsh with the required username and I'm able to find and do select statements on the system_schema keyspace just fine. I can also pull other tables we've created just fine via Spark-Cassandra Connector.

Per docs it is implicitly authorized for all users to be able to query system_schema, since it is often used implicitly. Here are the role permissions anyway:

 role      | resource                                              | permissions
-----------+-------------------------------------------------------+--------------------------------------------------------------
 dbadmin |                                                  data | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}

I'm at a loss if we can query system_schema from Spark. We're manually maintaining a list of tables and keyspaces in the meantime, but it would be ideal if we could pull the tables from system_schema.

spark-cassandra-connector
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I'm assuming you can query other tables? I'm just wondering if it's only the system_schema table.
0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

You didn't specify which version of Cassandra you're connecting to and I'd like to quickly rule that out. The connector is only tested on open-source Apache Cassandra and DSE so it's not guaranteed to work with other forks/distributions of Cassandra. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.