what is the best and most accurate way to get record count in a Cassandra table with multi node Apache Cassandra version 3.11.6 cluster with replication factor of 3 ,
Performing a CQL COUNT()
has always been problematic in Cassandra not because it isn't capable but more a challenge inherent in its distributed architecture. I've written about this problem in detail in a blog post, Counting keys? Might as well be counting stars.
Luckily, we now have the DataStax Bulk Loader (dsbulk
tool) to the rescue. Primarily designed as a more efficient tool for bulk loading data in CSV or JSON format to a Cassandra cluster, the Bulk Loader is also features the ability to perform a distributed count of records in a table.
Here are the key references on the Bulk Loader tool:
Cheers!
6 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use