Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

vkayanala_42513 avatar image
vkayanala_42513 asked Erick Ramirez edited

Is it possible to load data from AWS S3 using DSBulk?

Hello,

I'm looking for examples to load data from aws s3 into a Cassandra ring using DSBulk tool.

Note: we have thousands of json files in s3, its bit time taking to download and load into Cassandra. So we are looking for feasible options here.

No such examples here: https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html

Thanks in advance.

Regards,

-Varun.

dsbulk
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

You can pipe the data to DSBulk on the command line using the AWS CLI.

Test it with a small amount of data in this way:

$ aws s3 cp s3://bucketname/path/to/somedata.json - | dsbulk load -k ksname -t tablename

Let us know how you go. Cheers!

[UPDATE] A new PR (#399) is currently being reviewed which will add support for reading JSON files directly from an S3 bucket. Hopefully it gets completed soon and included in the next release of DSBulk.

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Here is the output: It looks dsbulk not able to getting through.

aws s3 cp s3://bucketname/path_to_json_files | ./dsbulk load -c json -k keyspace -t table -m platform,timestamp,checkpoint_id,checkpoint_members,linked_checkpoints -h ‘x.x.x.x’ -u username -p ‘password’

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

aws help
aws <command> help
aws <command> <subcommand> help


aws: error: the following arguments are required: paths

Username and password provided but auth provider not specified, inferring PlainTextAuthProvider

Operation directory: /Users/vayanala/dsbulk-1.8.0/bin/logs/LOAD_20220120-164620-470193
total | failed | rows/s | p50ms | p99ms | p999ms | batches
0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00
Operation LOAD_20220120-164620-470193 completed successfully in less than one second.
Last processed positions can be found in positions.txt
0 Likes 0 ·

Sorry, I somehow missed your update.

I noted that you've specified a directory in your command. You can only load one JSON file at a time. Please have a look at my example command again. Cheers!

0 Likes 0 ·