While trying to fetch the data using spark-cassandra connector, the files which got created were with empty.
val conf = new SparkConf(true).set("spark.cassandra.connection.host", settings.serverIP) .set("spark.cassandra.auth.username", settings.username) .set("spark.cassandra.auth.password", settings.password) .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.executor.memory", settings.serverMemory) .set("spark.cassandra.input.split.size_in_mb", settings.inputSplitSizeInMb) .set("spark.eventLog.enabled", "true") .set("spark.cassandra.input.consistency.level", settings.consistencyLevel) val sc = new SparkContext(settings.masterURL, "structured_app", conf) val rdd_aui_state = sc.cassandraTable(settings.keyspace, settings.table).select("aui").where("updateddate > ?", settings.starttime).where("updateddate < ?", settings.endtime) rdd_aui_state.saveAsTextFile(settings.outputFileName)
1. Tried to give all the nodes as the contact list of one datacenter to get the full extraction.
2. Tried to clean the memory and then started extraction.
Sometimes, we are able to capture the data. However, sometimes files are not recording the data. All part-0* files are empty files.
I am just wandering that even then given all the nodes as the contact points, why we are not able to capture the data. However, if we are trying to extract the same data after some time, we are able to capture the data.
Spark Cluster: 2 Nodes (1 Master, 1 Worker)
DB Cluster: 10 Nodes
Can you please suggest?