My DSBULK load suddenly is running slow. When I turned on debug, i see that max batch is 2, ideally it should be 32. I have not changed anything. I am not sure why the max is changing to 2.
My DSBULK load suddenly is running slow. When I turned on debug, i see that max batch is 2, ideally it should be 32. I have not changed anything. I am not sure why the max is changing to 2.
Were you observing a better batch performance before on the same data? Because the efficiency of DSBulk's batching mechanism depends very much on the data being loaded. In general it works much better when the data to load is sorted by partition key and the row sizes are small.
Here is what you can try:
Increase dsbulk.batch.bufferSize
to e.g. 256 or 512. Take care, this could take up all the available heap.
If that doesn't help you could try setting dsbulk.batch.mode
to REPLICA_SET
. But beware that this will make your latencies much worse, so you also should set dsbulk.batch.maxBatchStatements
to a low value to avoid timeouts or errors (e.g. 5); then, if this works, you could slowly increase maxBatchStatements
to get the best throughput while keeping latencies acceptable.
7 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use