System_Schema not found using Datastax Spark Cassandra Connector (Cassandra 3.11.2, Spark 3.0.1)
The idea is to query system_schema.tables to dynamically find all tables within a cassandra cluster using datasources V1.
# query keyspaces print("\n***Querying keyspaces in Cassandra***") cassandra_options['keyspace'] = 'system_schema' cassandra_options['table'] = 'keyspaces' df_keyspaces = spark.read.format("org.apache.spark.sql.cassandra")\ .options(**cassandra_options)\ .load() df_keyspaces.show() # keyspaces = df_keyspaces.select("keyspace_name").rdd.flatMap(lambda x: x).collect() # query tables print("\n***Querying tables in Cassandra***") cassandra_options['keyspace'] = 'system_schema' cassandra_options['table'] = 'tables' df_tables = spark.read.format("org.apache.spark.sql.cassandra")\ .options(**cassandra_options)\ .load() df_tables.persist() df_tables.show()
However, when I try to do so, I get the following error:
pyspark.sql.utils.AnalysisException: Couldn't find system_schema or any similarly named keyspaces;
I login to the cqlsh with the required username and I'm able to find and do select statements on the system_schema keyspace just fine. I can also pull other tables we've created just fine via Spark-Cassandra Connector.
Per docs it is implicitly authorized for all users to be able to query system_schema, since it is often used implicitly. Here are the role permissions anyway:
role | resource | permissions -----------+-------------------------------------------------------+-------------------------------------------------------------- dbadmin | data | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}
I'm at a loss if we can query system_schema from Spark. We're manually maintaining a list of tables and keyspaces in the meantime, but it would be ideal if we could pull the tables from system_schema.