question

ecmprem avatar image
ecmprem asked smadhavan edited

How do I load data to a UDT column with DSBulk?

we are working on migration from marklogic to cassandra, using dsbulk tool receiving UDT map field error that failed for raw value, we ran with option allowing missing fields and extra fields, not sure it is problem with tool , Note: Same record we are able to insert directly in cql command, but in bulk tool its failing, please let us know if any one have idea how to map UDT, below is the record we r trying

"checklist":[{"dynamic_questionnaire_answers":[{"QuestionnaireAnswer":{"question_id":"1", "question_text":"Testing one scenario for questionnaire. *", "gui_control":"radio", "options":["Yes", "No"], "answer_max":1, "answer_min":1}}]}],

dsbulk
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image smadhavan ♦ commented ·

Hello @ecmprem, could you update your question to provide the below details?

  • Version of DSBulk. Run ./dsbulk --version to get the output
  • Version of C*/DSE
  • Full command that you used to run the load operation
  • Error message and the output files
  • Sample schema and data to reproduce the error
0 Likes 0 ·

1 Answer

smadhavan avatar image
smadhavan answered smadhavan edited

@ecmprem, since you haven't provided enough information about your schema, I am going to make an assumption here. In my below example demonstration, I have used DataStax Astra (DBaaS) to work with and DataStax Bulk Loader version 1.8.0.

Assumption #1:

Table Schema:

CREATE TABLE db3.ecmprem (
    i int PRIMARY KEY,
    checklist frozen<list<frozen<checklist>>>
);


UDT Schema:

CREATE TYPE db3.checklist (
    dynamic_questionnaire_answers frozen<list<frozen<"QuestionnaireAnswer">>>
);
CREATE TYPE db3."QuestionnaireAnswer" (
    question_id text,
    question_text text,
    gui_control text,
    options list<text>,
    answer_max int,
    answer_min int
);


Sample Input file in CSV:

Note: I have saved the below content into a .csv file to be used later with DSBulk load operation. Note the header is also present and will match with my C* table column names here.

i,checklist
1,"[{\"dynamic_questionnaire_answers\":[{\"question_id\":\"1\",\"question_text\":\"Testing one scenario for questionnaire. *\",\"gui_control\":\"radio\",\"options\":[\"Yes\",\"No\"],\"answer_max\":1,\"answer_min\":1}]}]"


DSBulk Load command used:

% ./dsbulk --version
DataStax Bulk Loader v1.8.0


% ./dsbulk load -k db3 -t ecmprem -b "/path/to/Downloads/secure-connect-db3.zip" -url '~/Downloads/ecmprem_unload/output-000001.csv' -u 'CHANGE_ME' -p 'CHANGE_ME' --datastax-java-driver.basic.request.consistency LOCAL_QUORUM

Output from running the above load command:

Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact point
Operation directory: /path/to/DSBulk/dsbulk-1.8.0/bin/logs/LOAD_20220110-143706-721059
total | failed | rows/s | p50ms | p99ms | p999ms | batches
    1 |      0 |      3 | 45.22 | 45.35 |  45.35 |    1.00
Operation LOAD_20220110-143706-721059 completed successfully in less than one second.
Last processed positions can be found in positions.txt


Validation:

By running a SELECT operation on my sample table via the CQL Console on the DataStax AstraDB I get the following:

token@cqlsh:db3> SELECT * FROM db3.ecmprem;

 i | checklist
---+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1 | [{dynamic_questionnaire_answers: [QuestionnaireAnswer(question_id=u'1', question_text=u'Testing one scenario for questionnaire. *', gui_control=u'radio', options=[u'Yes', u'No'], answer_max=1, answer_min=1)]}]

(1 rows)


Inserting directly via CQL:

The below mechanism to insert directly using CQL also works,

INSERT INTO db3.ecmprem (i, checklist) VALUES (
    1,
    [{
        dynamic_questionnaire_answers : [
            {
                question_id : '1', 
                question_text : 'Testing one scenario for questionnaire. *',
                gui_control : 'radio', 
                options: ['Yes','No'], 
                answer_max : 1, 
                answer_min : 1
            }
        ]
    }]
);


Assumption #2:

Table Schema:

CREATE TABLE db3.ecmprem (
    i int PRIMARY KEY,
    checklist frozen<list<frozen<checklist>>>
);

UDT Schema:

CREATE TYPE db3.checklist (
    dynamic_questionnaire_answers frozen<list<frozen<map<text, frozen<"QuestionnaireAnswer">>>>>
);
CREATE TYPE db3."QuestionnaireAnswer" (
    question_id text,
    question_text text,
    gui_control text,
    options list<text>,
    answer_max int,
    answer_min int
);

Sample Input file in CSV:

Note: I have saved the below content into a .csv file to be used later with DSBulk load operation. Note the header is also present and will match with my C* table column names here.

i,checklist
1,"[{\"dynamic_questionnaire_answers\":[{\"QuestionnaireAnswer\":{\"question_id\":\"1\",\"question_text\":\"Testing one scenario for questionnaire. *\",\"gui_control\":\"radio\",\"options\":[\"Yes\",\"No\"],\"answer_max\":1,\"answer_min\":1}}]}]"

DSBulk Load command used:

% ./dsbulk --version
DataStax Bulk Loader v1.8.0


% ./dsbulk load -k db3 -t ecmprem -b "/path/to/Downloads/secure-connect-db3.zip" -url '~/Downloads/ecmprem_unload/output-000001.csv' -u 'CHANGE_ME' -p 'CHANGE_ME' --datastax-java-driver.basic.request.consistency LOCAL_QUORUM

Output from running the above load command:

Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
Operation directory: /path/to/dsbulk-1.8.0/bin/logs/LOAD_20220110-152459-043829
total | failed | rows/s | p50ms | p99ms | p999ms | batches
    1 |      0 |      4 | 47.84 | 47.97 |  47.97 |    1.00
Operation LOAD_20220110-152459-043829 completed successfully in less than one second.
Last processed positions can be found in positions.txt


Validation:

By running a SELECT operation on my sample table via the CQL Console on the DataStax AstraDB I get the following:

token@cqlsh:db3> SELECT * FROM db3.ecmprem ;

 i | checklist
---+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1 | [{dynamic_questionnaire_answers: [{'QuestionnaireAnswer': QuestionnaireAnswer(question_id=u'1', question_text=u'Testing one scenario for questionnaire. *', gui_control=u'radio', options=[u'Yes', u'No'], answer_max=1, answer_min=1)}]}]

(1 rows)

Inserting directly via CQL:

The below mechanism to insert directly using CQL also works,

INSERT INTO db3.ecmprem (i, checklist) VALUES (
    1,
    [{
        dynamic_questionnaire_answers : [
            {
                'QuestionnaireAnswer' : {
                    question_id : '1', 
                    question_text : 'Testing one scenario for questionnaire. *',
                    gui_control : 'radio', 
                    options: ['Yes','No'], 
                    answer_max : 1, 
                    answer_min : 1
                }
            }
        ]
    }]
);


I hope this helps to get you moving forward!


Also, I wanted to lastly point out that this is definitely not an efficient data model having nested User Defined Types (UDT), but thats for a different day's discussion. I would strongly encourage you to view the below resources to understand Cassandra data modeling in general,

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.