Running production cluster of 6 nodes with 7TB data. How do i retrieve count of a table in prod environment.
Is it ok if run Select count (*) from keyspace.tablename
Running production cluster of 6 nodes with 7TB data. How do i retrieve count of a table in prod environment.
Is it ok if run Select count (*) from keyspace.tablename
No, it is not. I've explained the reasons in Why COUNT() is bad in Cassandra.
You will need to use a tool like the DataStax Bulk Loader (DSBulk). It is a tool for efficiently loading and unloading data from Apache Cassandra though that is not the extent of its abilities.
DSBulk has a nice feature for counting data in large tables in a distributed manner. It is open-sourced and free to use. Cheers!
5 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use