question

SLDR avatar image
SLDR asked Erick Ramirez commented

Unsupported data source V2 partitioning type: CassandraPartitioning

I am using the spark-cassandra-connector_2.12 v3.2.0 with Spark v3.3.0 with Cassandra v3.11.11 (also tried v3.11.13) with Azul Zulu Java 1.8.0_312 and with Scala v2.12.15 all on Windows.

I get the following exception when doing a simple spark-shell.cmd "spark.sql("SELECT * FROM lcc.cw_junk.junk").show". I watched Russell Spitzer's "DataSource V2 and Cassandra" video and it made it sound so easy. What am I doing wrong?

Output from spark-shell.cmd (note: excluded packages are because they fail to download):

spark-shell.cmd --packages com.datastax.spark:spark-cassandra-connector_2.12:3.2.0 --conf spark.cassandra.connection.host=127.0.0.1 --conf spark.cassandra.auth.username=myusername --conf spark.cassandra.auth.password=mypassword --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions --conf spark.sql.catalog.lcc=com.datastax.spark.connector.datasource.CassandraCatalog --conf spark.sql.defaultCatalog=lcc --exclude-packages com.thoughtworks.paranamer:paranamer,com.google.code.findbugs:jsr305
:: loading settings :: url = jar:file:/C:/s330/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: C:\Users\myusername\.ivy2\cache
The jars for the packages stored in: C:\Users\myusername\.ivy2\jars
com.datastax.spark#spark-cassandra-connector_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-26e2827a-0592-475e-97ad-e4d2ab4b8408;1.0
        confs: [default]
        found com.datastax.spark#spark-cassandra-connector_2.12;3.2.0 in central
        found com.datastax.spark#spark-cassandra-connector-driver_2.12;3.2.0 in central
        found com.datastax.oss#java-driver-core-shaded;4.13.0 in central
        found com.datastax.oss#native-protocol;1.5.0 in central
        found com.datastax.oss#java-driver-shaded-guava;25.1-jre-graal-sub-1 in central
        found com.typesafe#config;1.4.1 in central
        found org.slf4j#slf4j-api;1.7.26 in spark-list
        found io.dropwizard.metrics#metrics-core;4.1.18 in central
        found org.hdrhistogram#HdrHistogram;2.1.12 in central
        found org.reactivestreams#reactive-streams;1.0.3 in central
        found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
        found com.github.spotbugs#spotbugs-annotations;3.1.12 in central
        found com.datastax.oss#java-driver-mapper-runtime;4.13.0 in central
        found com.datastax.oss#java-driver-query-builder;4.13.0 in central
        found org.apache.commons#commons-lang3;3.10 in central
        found org.scala-lang#scala-reflect;2.12.11 in central
:: resolution report :: resolve 683ms :: artifacts dl 52ms
        :: modules in use:
        com.datastax.oss#java-driver-core-shaded;4.13.0 from central in [default]
        com.datastax.oss#java-driver-mapper-runtime;4.13.0 from central in [default]
        com.datastax.oss#java-driver-query-builder;4.13.0 from central in [default]
        com.datastax.oss#java-driver-shaded-guava;25.1-jre-graal-sub-1 from central in [default]
        com.datastax.oss#native-protocol;1.5.0 from central in [default]
        com.datastax.spark#spark-cassandra-connector-driver_2.12;3.2.0 from central in [default]
        com.datastax.spark#spark-cassandra-connector_2.12;3.2.0 from central in [default]
        com.github.spotbugs#spotbugs-annotations;3.1.12 from central in [default]
        com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
        com.typesafe#config;1.4.1 from central in [default]
        io.dropwizard.metrics#metrics-core;4.1.18 from central in [default]
        org.apache.commons#commons-lang3;3.10 from central in [default]
        org.hdrhistogram#HdrHistogram;2.1.12 from central in [default]
        org.reactivestreams#reactive-streams;1.0.3 from central in [default]
        org.scala-lang#scala-reflect;2.12.11 from central in [default]
        org.slf4j#slf4j-api;1.7.26 from spark-list in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   16  |   0   |   0   |   0   ||   16  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-26e2827a-0592-475e-97ad-e4d2ab4b8408
        confs: [default]
        0 artifacts copied, 16 already retrieved (0kB/29ms)
22/06/23 10:44:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://myhost.mydomain.com:4040
Spark context available as 'sc' (master = local[*], app id = local-1655999101732).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
      /_/


Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.


scala> spark.sql("describe table lcc.cw_junk.junk").show
22/06/23 10:45:11 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
+--------------+---------+-------+
|      col_name|data_type|comment|
+--------------+---------+-------+
|         junk1|   string|       |
|         junk2|   string|       |
|         junk3|   bigint|       |
|         junk4|   bigint|       |
|              |         |       |
|# Partitioning|         |       |
|        Part 0|    junk1|       |
+--------------+---------+-------+




scala> spark.sql("SELECT * FROM lcc.cw_junk.junk").show
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning type: CassandraPartitioning
  at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
  at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
  at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$.apply(V2ScanPartitioning.scala:34)
  at org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$.apply(V2ScanPartitioning.scala:33)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
  at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
  at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
  at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:106)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2863)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:3084)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:288)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:327)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:808)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:767)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:776)
  ... 47 elided


scala>


spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

Without having looked at your post in detail, my initial reaction is that you're running with Spark 3.3.0 which I'm not sure is supported by Spark connector 3.2. Please try running the demo again from a Spark 3.2 cluster and let us know how you go.

In the meantime, I'm going to reach out to the engineers to get them to respond to your question. Cheers!

[UPDATE] After looking through the Spark code I found out that V2 partitioning is completely new in Spark 3.3.0 introduced by SPARK-37377 so I would try using Spark 3.2 instead.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I tried Spark 3.2.1 and it WORKED!!!

Thanks for your help.

0 Likes 0 ·
Fantastic!
0 Likes 0 ·