Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

shehzadjahagirdar_185613 avatar image
shehzadjahagirdar_185613 asked Erick Ramirez edited

Spark job takes 28 minutes to read 90M records

When trying to read data through spark job with spark-cassandra-connector-2.4.2 it takes 28 minutes to read 90000000 records.we need to reduce this time to 5-10 mins also below are the config used in spark job & while reading out of 5 executors of spark only one executor has tasks running on it other 4 are idle.Our cassandra version is apache-cassandra-3.11.3.

  
                 
  1. fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
  2. fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
  3. google.cloud.auth.service.account.enable", "true"
  4. spark.yarn.maxAppAttempts", "1"
  5. spark.memory.offHeap.enabled", "true"
  6. spark.memory.offHeap.size", "16g"
  7. spark.sql.broadcastTimeout", "36000"
  8. spark.network.timeout", "600s"
  9. spark.cassandra.input.consistency.level", "LOCAL_QUORUM"
  10. spark.cassandra.output.consistency.level", "ANY"
  11. spark.sql.shuffle.partitions", "150"
  12. spark.shuffle.blockTransferService", "nio"
  13. spark.maxRemoteBlockSizeFetchToMem", "2000m"
  14. spark.sql.hive.filesourcePartitionFileCacheSize", "0"
  15. spark.cassandra.input.split.size_in_mb","512"
spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

[Closed as duplicate of question #13026]

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.