Hi,
I am using DSBulk to unload data into CSV from a DSE cluster installed under Kubernetes, My cluster consists of 9 Kubernetes Pods each with 120 GB Ram.
I have monitored the resources while unloading the data and observed that the more the data is fetched in CSV the more the ram is getting utilised and pods are restarting due to lack of memory.
If one Pod is down at a time the DSBulk unload won't fail, but if 2 Pods are down unload will fail with the exception :
Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but
only 0 replica responded).
Is there a way to avoid this exceeding of memory happening or is there a way to increase the timeout duration.
The command I am using is :
dsbulk unload -maxErrors -1 -h ‘[“ < My Host > ”]’ -port 9042 -u < My user name > -p < Password > -k < Key Space > -t < My Table > -url < My Table > --dsbulk.executor.continuousPaging.enabled false --datastax-java-driver.basic.request.page-size 1000 --dsbulk.engine.maxConcurrentQueries 128 --driver.advanced.retry-policy.max-retries 100000