There is a requirement where we need to track total records inserted in table in a day.What is the best way to do this.
Bringing together the Apache Cassandra experts from the community and DataStax.
Want to learn? Have a question? Want to share your expertise? You are in the right place!
Not sure where to begin? Getting Started
There isn't an out-of-the-box solution for this kind of use case. Typically you will need write an app that would do this for you or run a Spark job.
I don't recommend using the CQL
COUNT() function because it will affect the performance of your cluster. I've discussed it in detail in this post -- Why COUNT() is bad in Cassandra.
Alternatively, you can use the COUNT command in the DataStax Bulk Loader (DSBulk) utility. At the same time each, you can run a
dsbulk count on a table to get the total records and subtract the previous day's total to get today's tally.
The challenge with counting records is figuring out how to deal with updates and deletes. Unless you're just inserting new records every day, it's almost impossible to reconcile how many are new and how many were deleted. Cheers!
7 People are following this question.