I'm running a Spark job on my DSE 6.8.0 cluster, and when I do a certain `regexp_replace`, it returns the error `Invalid request, too many continuous paging sessions are already running: 60`
val punctuation = """!"#$%&\'()*+,-./:;<=>?@[\]^_\{|\}\~""" val descNoPunct = df.withColumn("desc_no_punct", regexp_replace($"description", punctuation, ""))
(The `description` column is TEXT type in Cassandra)
Here is more of the error message, in case it's helpful:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 21.0 failed 10 times, most recent failure: Lost task 0.9 in stage 21.0 (TID 174, 127.0.0.1, executor 2): java.io.IOException: Exception during execution of SELECT "episode_guid", "description" FROM "podcast_analysis_tool"."episodes_by_podcast" WHERE token("podcast_api", "podcast_api_id") > ? ALLOW FILTERING: Invalid request, too many continuous paging sessions are already running: 60 at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:349) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:366) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:366) ...
I could give more of my code if it's helpful, but actually my guess is that it's happening because I'm just running this code too many times in a row, without doing any `LIMIT` on it.
I have three related questions:
- Is there a way to check how many continuous paging sessions are currently running at a given moment?
- Are there up to date docs on continuous paging? (Google searches tend to mostly show this blog post, but it's from three years ago and refers to DSE 5.0, and some points seem to be out of date)
- Is there a way in Spark to just tell it to wait rather than adding more continuous paging sessions?