Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

Naraen avatar image
Naraen asked Naraen edited

Can we reuse the same session created by DSBulk initially to export/import multiple tables?

Can we reuse the same session created by DSbulk initially to export/import multiple tables?

Is one session enough for multiple tables or new session will be created for each table?

We don't want to create new session for every table because creating new session everytime may result in latency when we need to import multiple tables at a time.

So is there any solution/alternative approach for this?

dsbulk
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Naraen edited

No, you can't. A session is only valid for the life of each DSBulk job (load, unload, count) and doesn't persist across multiple runs.

For what it's worth, you cannot run an import (load) or export (unload) with DSBulk across multiple tables -- each job run can target a single table only. For example if you're loading data to 50 tables, you need to run DSBulk 50 times -- one for each table with the relevant data source.

Latency is a factor of your cluster's capacity. If you're concerned about latency then you need make sure you've provisioned sufficient nodes in your cluster and add more nodes as appropriate. Cheers!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

@Erick Ramirez

Thanks for the detailed and quick response!!

0 Likes 0 ·

Happy to help. Cheers!

0 Likes 0 ·

hi @Erick Ramirez,

We used dsbulk to perform unload operation, we could see the time taken was actually more than it shows in its original logs.

Attaching the details here,

[aaa@bbb bin]$ date; ./dsbulk unload -k *** -t *** -h *** -u *** -p *** > /aaa/bbb/xxx.csv;date;
Wed Mar 10 06:08:40 CST 2021
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
Operation directory: /aaa/bbb/dsbulk-1.7.0/bin/logs/UNLOAD_20210310-120841-278359
total | failed | rows/s | p50ms | p99ms | p999ms
38,260 | 0 | 14,304 | 8.20 | 78.64 | 78.64
Operation UNLOAD_20210310-120841-278359 completed successfully in 2 seconds.
Wed Mar 10 06:08:47 CST 2021

Here we could see that the original time taken is 7 secs but it shows only 2 secs in its logs.Is this because of time taken to establish session? Or something else?


So if session connectivity is the reason, say we have 70 tables, and if there is a delay of 7 secs for each table will the latency be 490 secs?


0 Likes 0 ·

Also can we use multi threading (parallel processing) here? to improve performance? We need to export multiple tables in a given time using a single python script.

Please let us know your suggestions/recommendations here.


0 Likes 0 ·