The documentation says that batch size should be small when datasets are large. Why is that. Also how is this parameter related to maxConcurrentQueries
--batch.maxBatchStatements --dsbulk.batch.maxBatchStatements <number>
The maximum number of statements that a batch can contain. The ideal value depends on two factors:
- The data being loaded: the larger the data, the smaller the batches should be.
- The batch mode: when
PARTITION_KEY
is used, larger batches are acceptable, whereas whenREPLICA_SET
is used, smaller batches usually perform better. Also, when usingREPLICA_SET
, it is preferrable to keep this number below the threshold configured server-side for the settingunlogged_batch_across_partitions_warn_threshold
(the default is 10); failing to do so is likely to trigger query warnings (seelog.maxQueryWarnings
for more information). When set to a value lesser than or equal to zero, the maximum number of statements is considered unlimited. At least one ofmaxBatchStatements
ormaxSizeInBytes
must be set to a positive value when batching is enabled.
Default: 32.