Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



mrcyze_148473 avatar image
mrcyze_148473 asked Russell Spitzer answered

Spark-Connector: Read of empty table takes ~10 minutes

I have a spark application that uses `sparkContext.cassandraTable[DomainObjectType](keyspace, table)`. The first time I run this app, this table is empty. However, the read takes ~10 minutes to complete. I’m struggling to understand why this is occurring. I see this across all environments, large and small in terms of resources

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Russell Spitzer avatar image
Russell Spitzer answered

If there is no data to be read the simplest explanation would be that the time is coming from the overhead of setting up tasks and executing them in Spark. The only reason this would take 10~ minutes would be if the table was being read into thousands of Spark Tasks.

To check this I would look at the Spark UI (port 4040 on the node running the Spark Application) and see how many tasks are being generated.

If the amount of tasks is very large (in the hundreds or thousands) then this can be caused by a few things. The number of tasks is determined by the Size of the Cassandra Table reported in the Size_Estimates table but this can lead to extreme overestimates in a few edges cases.

Specifically, if the the estimates are being made from an alternate DC and the DC's are not using VNodes the distribution of token data can cause some big issues. In this case you can manually specify the number of tasks to create in the "ReadConf" for the RDD being read.

Previously there was also a bug where there would be an overflow on certain settings causing a giant amount of tasks to be made even when no data was present so be sure you are using the latest connector.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.