Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

ulavapalle.meghamala_184243 avatar image
ulavapalle.meghamala_184243 asked ·

Facing issue while loading datastax bulk loader

Hi,

We are evaluating using datastax bulk loader (dsbulk) for loading bulk csv data into cassandra cluster. But, we are facing an issue while loading the data.

If we use, "dsbulk-1.4.1/bin/dsbulk load -f xxxxxx.conf -url xxxxxx.csv -k xxxxxxx -t xxxxxxx -h ‘x.x.x.x,x.x.x.x,x.x.x.x' -header true" - we see an error in the logs as

Source: "2fa67c1df1913c24",0,0,0,0,0,0,2,0,0\u000d

java.lang.IllegalArgumentException: Expecting record to contain 10 fields but found 11.

        at com.datastax.dsbulk.connectors.api.internal.DefaultRecord.<init>(DefaultRecord.java:125)

        at com.datastax.dsbulk.connectors.api.internal.DefaultRecord.mapped(DefaultRecord.java:58)

        at com.datastax.dsbulk.connectors.csv.CSVConnector.lambda$readURL$6(CSVConnector.java:523)

        at com.datastax.dsbulk.engine.LoadWorkflow.parallelFlux(LoadWorkflow.java:258) [20 skipped]

        at com.datastax.dsbulk.engine.LoadWorkflow.execute(LoadWorkflow.java:191)

        at com.datastax.dsbulk.engine.DataStaxBulkLoader$WorkflowThread.run(DataStaxBulkLoader.java:128)

If we use, "dsbulk-1.4.1/bin/dsbulk load -f xxxxxx.conf -url xxxxxx.csv -k xxxxxxx -t xxxxxxx -h ‘x.x.x.x,x.x.x.x,x.x.x.x' -header true -newline '\u000d'" , the data is getting loaded, but when I query the database , I see \n and "" also in the column value

select idx from keyspace.table
 idx
----------------------------------------------------
 \n"59976eb7.a787.4ba5.ba95.37e8fd3e91afU9ou2fRxx2"
                               \n"18ce047c7556123f"
 \n"7b434da2.4f45.4013.bb65.987b8d83b6a6lv5DhEPNld"
 \n"fa8cd5bd.2a6c.4b28.8f20.b07a9a3864c1zgUl6p7R2k"
 \n"99b6b8c2.3cfe.4b64.a971.08a7c735b54bb0eVLzzPjV"
 \n"2b3860fa.0717.407c.bf3b.018e545e373bNJ7cInZaXX"
                               \n"af2eea0560ec9082"

We don't want new line character(\n) as part of the idx field.Can you please let us know what configuration should I be using to get this issue resolved ?

dsbulkloadcsv
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@ulavapalle.meghamala_184243 The issue you have is because the CSV is in DOS/Windows format which indicates that it was created/modified on a Windows machine. The carriage return (\u000d) at the end of each line isn't Unix-compatible.

Please try to convert the file to Unix using a utility like dos2unix to replace the DOS/Mac line breaks with Unix line breaks. That should allow you to bulk load the data. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for your quick response ! It worked !

0 Likes 0 · ·
Erick Ramirez avatar image Erick Ramirez ♦♦ ulavapalle.meghamala_184243 ·

Not a problem. Cheers!

0 Likes 0 · ·