I have 18 node cassandra cluster which has large partition size issue. My question is can i do table histograms on every node for the table and take the average to get the partition size.
I have 18 node cassandra cluster which has large partition size issue. My question is can i do table histograms on every node for the table and take the average to get the partition size.
I'm not sure about the question. The partition size is specific to that node because each node owns different token ranges. All you would get from taking all the means of each node would be an overall cluster mean. If that's what you want then yes. However, I'm not sure what that gains you. If you want to find large partitions, you can look in the logs for "writing large partition" or use something like sstablemetadata to see the partition sizes.
By definition, histograms are approximate distributions of data. If you tried to average out the partition sizes from all the histogram outputs, the best you could get is an approximate average partition size -- it does not lead you "... to get the partition size".
In any case, I'm not sure what outcome you're trying to achieve. If you tell us what problem you're trying to solve, we might be able to give you a better answer. Cheers!
7 People are following this question.
DataStax Enterprise is powered by the best distribution of Apache Cassandra ™
© 2023 DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Privacy Policy Terms of Use