Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

kajarvine_115939 avatar image
kajarvine_115939 asked ·

What does DSBulk count?

Hi,
Suppose we have a wide cluster. DC1 and Dc2, each has five servers.
So 10 together, RF between a Table is three; 3.

Now, I want to count the rows in BIGTABLE.
Great: dsbuil count -k myts t- bigtable -h <ip_dc1_node1>

I get 500,000 as a response.

And finally the question: Is 500,000 the number of rows on that node (1 node), or number of rows on the total Cluster (10 nodes)?



countdsbulk
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cedrick Lunven avatar image
Cedrick Lunven answered ·

The count operation will compute the number of rows in the table bigtable and keyspace myts


As per your requirement let's define the keyspace :

CREATE KEYSPACE IF NOT EXISTS myts 
WITH REPLICATION = 
  { 'class' : 'NetworkTopologyStrategy', 
    'DC1' : 3, 
    'DC2' : 3, }

With 500.000 total rows for your table. You would have :

  • 1.500.000 rows in DC1 (RF=3) but we cannot tell how much on each node, about 300.000 because you stated 5 nodes in DC1
  • 1.500.000 rows in DC2 (RF=3) but we cannot tell how much on each node, about 300.000 because you stated 5 nodes in DC2.
  • 3.000.000 records in the whole cluster (records in DC1 + records in DC2)


DSBulk Documentation:https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkCmd.html

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

@kajarvine_115939 that result is the count for the table. The host you provided in the command line is just the initial contact point so DSBulk can connect to the cluster. In then runs the query against the table to get the result. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.