question

shrivastave2000_189226 avatar image
shrivastave2000_189226 asked Erick Ramirez edited

How do I upload nested JSON data from a CSV as Text in vertex with the Bulk Loader?

I want to upload nested json in dse graph using dsbulk from csv file. but not able to upload data.

I have json data as string and my final goal is to upload this as text in Account vertex's data part.

"{""Account ID"":""ASONAR"",""VOL ID"":""somename"",""Persona ID"":""Sonar"",""Account Login"":""ASonar@csomename.com""}"

I have schema for vertex Account like this:

schema.vertexLabel('Account').
ifNotExists().
partitionBy('tenantId', Ascii).
partitionBy('appId', Ascii).
partitionBy('nativeType', Ascii).
clusterBy('entityGlobalId', Uuid, Asc).
property('data', Text).
property('displayName', Text).
create()
//Ensure no searches are made without partitioning key
schema.vertexLabel('Account').
searchIndex().
ifNotExists().
by('nativeId').asString().
by('displayName').asText().
by('status').
waitForIndex(30).
create()

and I am trying to upload this data lets say:

tenantId,appId,nativeType,entityGlobalId,entityKey,Status,metaType,nativeStatus,updateTime,nativeAsOnTime,nativeModifiedOnTime,createTime,nativeId,displayName,data
Default,someId,Service Account,6ab03f78-8da0-438c-b619-2453320dbcd4,'ASONAR',Active,Service Account,,,05-11-18 12:18:43.000000 PM,05-11-18 12:18:43.000000 PM,05-11-18 12:19:38.738000 PM,'ASONAR','ASONAR',"{""Account ID"":""ASONAR"",""V ID"":""Abhishek"",""Persona ID"":""Sonar"",""Account Login"":""Asomeone@someone.com""

Using this:

>dsbulk load -url D:\\dse_6.8\\iap_schema\\Account_schema\\vertcies\\tryAccount.csv -g iapTest -v Account -header true -h 10.1.27.44 -delim "," --schema.allowMissingFields true -u confluxsys -p passw0rd --driver.advanced.auth-provider.class DsePlainTextAuthProvider

got this:

[s0] Error while computing token map for replication settings {SearchGraphAnalytics=3, class=org.apache.cassandra.locator.NetworkTopologyStrategy}: could not achieve replication factor 3 for datacenter SearchGraphAnalytics (found only 1 replicas).
total | failed | vertices/s | p50ms | p99ms | p999ms | batches
1 | 1 | 0 | 0.00 | 0.00 | 0.00 | 0.00
Operation LOAD_20200526-063359-270000 completed with 1 errors in 0 seconds.

Logs:

Source: Default,CAMR-VOLVisaRiskManager,Service Account,6ab03f78-8da0-438c-b619-2453320dbcd4,'ASONAR',Active,Service Account,,,05-11-18 12:18:43.000000 PM,05-11-18 12:18:43.000000 PM,05-11-18 12:19:3
8.738000 PM,'ASONAR','ASONAR',"{""Account ID"":""ASONAR"",""VOL ID"":""Abhishek"",""Persona ID"":""Sonar"",""Account Login"":""Abhishek.Sonar@confluxsys.com""}"
java.lang.IllegalArgumentException: Expecting record to contain 15 fields but found 18.
at com.datastax.dsbulk.connectors.api.internal.DefaultRecord.<init>(DefaultRecord.java:125)
at com.datastax.dsbulk.connectors.api.internal.DefaultRecord.mapped(DefaultRecord.java:58)
at com.datastax.dsbulk.connectors.csv.CSVConnector.lambda$readSingleFile$1(CSVConnector.java:244)
at com.datastax.dsbulk.engine.LoadWorkflow.parallelFlux(LoadWorkflow.java:259) [23 skipped]
at com.datastax.dsbulk.engine.LoadWorkflow.execute(LoadWorkflow.java:192)
at com.datastax.dsbulk.engine.DataStaxBulkLoader$WorkflowThread.run(DataStaxBulkLoader.java:128)

_____________________________

when I do using delimiter "|" then I got this:

[s0] Unexpected error while refreshing schema during initialization, keeping previous version (CompletionException: com.datastax.oss.driver.api.core.DriverTimeoutException: query 'SELECT * FROM system_schema.tables' timed out after PT2S)
Operation LOAD_20200526-065323-637000 failed: Keyspace "iapTest" does not exist. <<<

I also add " for making this as whole string but .it didn't work.

bulk loader
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

alexandre.dutra avatar image
alexandre.dutra answered shrivastave2000_189226 commented

OK let's address each problem one by one:

  1. The default CSV escape char in DSBulk is the backslash \, so if you want to escape a double-quote, you need \"; however you are escaping quotes using duplicated quotes "", so this cannot work. This explains the error message: Expecting record to contain 15 fields but found 18. You can either:
    1. Change the escape char to be " with --connector.csv.escape="\""; or
    2. Rewrite your json string as this to make it compliant with the default escape char: "{\"Account ID\":\"ASONAR\",\"VOL ID\":\"Abhishek\",\"Persona ID\":\"Sonar\",\"Account Login\":\"Abhishek.Sonar@confluxsys.com\"}"; or
    3. Use pipe | as delimiter with --connector.csv.delimiter="|" and simplify your json string to: {"Account ID":"ASONAR","VOL ID":"Abhishek","Persona ID":"Sonar","Account Login":"Abhishek.Sonar@confluxsys.com"}". This certainly gives a much more readable file.
  2. Your cluster is slow, you should increase some driver timeouts to get rid of the message: Unexpected error while refreshing schema. Also, because the schema refresh operation timed out, the driver could not find the keyspace iapTest, which is why the operation aborted. Add the following options to your dsbulk configuration:
    1. --driver.advanced.connection.connect-timeout="30 seconds"
    2. --driver.advanced.connection.init-query-timeout="1 minute"
    3. --driver.advanced.connection.set-keyspace-timeout="30 seconds"
    4. --driver.advanced.heartbeat.timeout="5 minutes"
    5. --driver.advanced.metadata.schema.request-timeout="5 minutes"

Try these fixes and let us know how it went. Good luck!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

shrivastave2000_189226 avatar image shrivastave2000_189226 commented ·

it's very helpful , now I am able to upload data .Thanks alexendre.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

@shrivastave2000_189226 It looks like you've put in too many quotes and it's probably confusing the CSV parser. You didn't provide a copy of your schema or the full details of the errors you are getting. Ideally you should provide a full set of instructions to replicate the problem so we can assist you better.

Here is an example table I have created just so I can show you how the JSON data is formatted in the CSV:

CREATE TABLE community.bulkmap (
    id text PRIMARY KEY,
    intmap map<text, int>
)

And here is the sample CSV file. Note that it is delimited by a "pipe" (|).

id,intmap
id1,{"key1":1, "key2":2, "key3":3}

In your case, the JSON data should be formatted in this manner:

{"Account ID":"ASONAR","VOL ID":"somename","Persona ID":"Sonar","Account Login":"ASonar@csomename.com"}

But the above would depend on the schema you're loading to. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

shrivastave2000_189226 avatar image shrivastave2000_189226 commented ·

what about for nested json like this , I don't think it would work for this ? or it would work.

{     
"id": "0001",  
"type": "donut",     "
name": "Cake",     "
ppu": 0.55,     
"batters":         
 {             "batter":  [ { "id": "1001", "type": "Regular" },                                                   { "id": "1002", "type": "Chocolate" },   
                           { "id": "1003", "type": "Blueberry" },                           { "id": "1004", "type": "Devil's Food" }                                              ]   
    },   

}
0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ shrivastave2000_189226 commented ·

You'll need to cleanup/reformat the JSON data to match the schema of the table you're loading to. See the example of nested collections on the Loading data examples page to see if you can make your use case fit one of the examples. Cheers!

0 Likes 0 ·