Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

rajib76 avatar image
rajib76 asked Erick Ramirez answered

How do I interpret total records, batches in the DSBulk monitoring report?

I am investigating an issue with my DSBULK run. I turned on verbosity and see the below DEBUG statement. I am not able to make out the below metrics. I was thinking if I multiply total batches(380911) with average size(13.94), I will get the total number of records(5258523). But that does not match. Also I thought the total writes(5258334) + in-flight(35) should match total records(5258523). That also does not match. What does in-flight mean ?

2022-02-02 22:20:08 DEBUG Records: total: 5,258,523, successful: 5,258,523, failed: 0
2022-02-02 22:20:08 DEBUG Batches: total: 380,911, size: 13.94 mean, 1 min, 32 max
2022-02-02 22:20:08 DEBUG Writes: total: 5,258,334, successful: 5,258,334, failed: 0, in-flight: 35

The error I am getting is as below, but it is intermittent

2022-02-02 17:21:33 ERROR Operation LOAD_20220202-171551-734406 failed: Unable to perform authorization of permissions: Unable to perform authorization of super-user permission: Operation timed out - received only 1 responses.
com.datastax.oss.driver.api.core.servererrors.UnauthorizedException: Unable to perform authorization of permissions: Unable to perform authorization of super-user permission: Operation timed out - received only 1 responses.
        Suppressed: java.lang.Exception: #block terminated with an error
                at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.execute(LoadWorkflow.java:220) [2 skipped]
                at com.datastax.oss.dsbulk.runner.WorkflowThread.run(WorkflowThread.java:53)

When it happened yesterday, I truncated the table and ran, it worked fine. Today truncation did not help, I reduced maxbatchstatements to 20 and then it ran fine. Later I increased it again to 32 and it worked fine again. Trying to find out the DSBULK code to see how it works. But wanted to see if I can get help to understand this. The error says super-user but I am not running it with role having super-user access. It is kind of a mis-leading error.

dsbulk
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

Multiplying the batch total by the mean size won't give you an exact match with the records total because the size is just an average. As this line indicates:

DEBUG Batches: total: 380,911, size: 13.94 mean, 1 min, 32 max

The batch sizes range from 1 to 32 records. The mean is a statistical measure, not an absolute value.

Similarly, the total writes won't necessarily match with the total records processed since some of the records are batched together into a single write request. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.