question

kokoluciferus avatar image
kokoluciferus asked Erick Ramirez answered

How do I COPY local file into a Cassandra table in Astra?

I am familiar with the COPY command but do not know how it can be implemented successfully using Astra. Do I need to upload the csv data into Astra or Astra can be connected to my local files?

What steps do I need to follow to bulk-load csv data into cassandra table in Astra literally.

Are there any video resource that provides a step-by-step guide on how to do this?

astradsbulk
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

You can use the CQL COPY FROM command to load a CSV file to Astra if you download the standalone CQL Shell on your local machine and run the command from there.

However, the recommended way of loading data is using the Bulk Loader (DSBulk) tool. DSBulk lets you load, unload and count data to/from a Cassandra database and works with DataStax Enterprise, DataStax Astra and open-source Apache Cassandra.

Here's an example I previously provided in question #10901:

Prerequisites

Schema

For the purposes of my example, I have a keyspace community and table users:

CREATE TABLE community.users (
    username text,
    realname text,
    email text,
    PRIMARY KEY (username)
)

Data

Here are the contents of my users.csv:

username,realname,email
alice,Alice Bautista,alice@acme.com
bob,Bob Adams,bob.adams@mail.co
charlie,Charlie Choi,cchoi@gotmail.com

App token

On the Astra UI, select your database and click on the Settings tab to generate a new token.

In my case, I generated a token with just read/write access:

c10901-astra-generate-token.png

IMPORTANT: Download the credentials in CSV format since you will not be able to view the details again once you close the window.

Load to Astra

Step 1 - Create the table schema on Astra.

Step 2 - Generate an app token.

Step 3 - Download the secure connect bundle to your local machine.

Step 4 - On your local machine, create the users.csv data file.

Step 5 - Using the secure bundle and client ID + secret, load the data as follows:

$ dsbulk load -url /path/to/users.csv -header true -k community -t users -b "/path/to/secure-connect-db.zip" -u client_id -p client_secret

In my case, the output was:

Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
Operation directory: /home/ubuntu/dsbulk-1.5.0/logs/LOAD_20210406-032949-502743
total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
    3 |      0 |      2 | 267.56 | 278.92 | 278.92 |    1.00
Operation LOAD_20210406-032949-502743 completed successfully in 0 seconds.
Last processed positions can be found in positions.txt

I connected to the CQL Console on the Astra UI to verify that the data was loaded to the table:

token@cqlsh> SELECT * FROM community.users ;

 username | email             | realname
----------+-------------------+----------------
      bob | bob.adams@mail.co |      Bob Adams
  charlie | cchoi@gotmail.com |   Charlie Choi
    alice |    alice@acme.com | Alice Bautista

Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.