The documentation says that batch size should be small when datasets are large. Why is that. Also how is this parameter related to maxConcurrentQueries
--batch.maxBatchStatements --dsbulk.batch.maxBatchStatements <number>
The maximum number of statements that a batch can contain. The ideal value depends on two factors:
- The data being loaded: the larger the data, the smaller the batches should be.
- The batch mode: when
PARTITION_KEYis used, larger batches are acceptable, whereas when
REPLICA_SETis used, smaller batches usually perform better. Also, when using
REPLICA_SET, it is preferrable to keep this number below the threshold configured server-side for the setting
unlogged_batch_across_partitions_warn_threshold(the default is 10); failing to do so is likely to trigger query warnings (see
log.maxQueryWarningsfor more information). When set to a value lesser than or equal to zero, the maximum number of statements is considered unlimited. At least one of
maxSizeInBytesmust be set to a positive value when batching is enabled.