I have a table where nodesync is enabled. When I run a DSBULK on that table, I see compactions triggering continuously which I think is delaying the load. Is it a good practice to disable nodesync while loading tables with DSBULK.
I have a table where nodesync is enabled. When I run a DSBULK on that table, I see compactions triggering continuously which I think is delaying the load. Is it a good practice to disable nodesync while loading tables with DSBULK.
Hi! Nodesync works on the write path as opposed to the repair service which uses the streaming path. With that said, if you are inserting a lot of data it might help to temporarily disable nodesync on the table while you load the data to prevent write amplification.
DSBulk doesn't really have anything to do with NodeSync or compactions. I think you're just conflating them unnecessarily.
I've explained in your other question (#13285) that compactions are part of the normal operation of Cassandra. They get triggered whether you are bulk-loading data or not.
Similarly, NodeSync (and repairs for that matter) are part of keeping data consistent across nodes in your cluster and is necessary given the distributed architecture of Cassandra. It is normal for repairs and NodeSync to be running in the background.
The fact that you think there's a delay while you are loading data just indicates that you haven't sized your cluster correctly. Remember that Cassandra scales linearly --
Based on your previous questions, I think we've established that your cluster does not have sufficient capacity to sustain the throughput you require so you should resize your cluster accordingly, even if it's just a temporary measure while you are migrating data. There's no cost associated with adding nodes temporarily since DSE subscriptions allow for temporary "bursting" during peak events like Black Friday sales. Cheers!
7 People are following this question.
OpsCenter reports "Cannot run anti-entropy repair on tables with NodeSync enabled"
Why should the DSBulk batch size be small when the datasets are huge?
Can DataStax Bulk Loader be run from within a .NET application?
Why does the DSBulk logs show both 7 errors and 0 errors?
Can we repair a Nodesync-enabled table if the service is disabled?
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use