Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started



corren.mccoy avatar image
corren.mccoy asked corren.mccoy commented

What do the columns in the DSBulk console output mean?

total | failed | rows/s | mb/s | kb/row | p50ms | p99ms | p999ms | batches

150 | 0 | 329 | 0.02 | 0.05. | 9.84 | 39.58 | 45.88 | 1.00

I've read the DataStax docs, blogs, GitHub site for dsbulk, and watched the videos. So far, nothing has explained the dsbulk final stats beyond the total and failed columns except to say "The remaining statistics are throughput and latency metrics for the load." Where can I find the definitions for these columns; especially p50ms, p99ms, p999ms.

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered corren.mccoy commented

There is no documentation about the console output for the DataStax Bulk Loader (DSBulk) because they are meant to be self-explanatory.

The columns p50ms, p99ms and p999ms is a histogram of the distribution of the read/write latencies reported in milliseconds. The values in each column are the 50th, 99th and 999th percentile latencies respectively. Cheers!

4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez, What would be the average time of loading for a SINGLE row going by the about dsBulk statistics. Thanks

0 Likes 0 ·

That metric isn't recorded by DSBulk. Cheers!

0 Likes 0 ·

Are you sure @Erick Ramirez ? - this was a question in the Developer Certification Exam and I did pour my heart and soul into the various bits of dsBulk documentation but could nt get any sign of it.

0 Likes 0 ·
Show more comments