Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

tonazeem_142892 avatar image
tonazeem_142892 asked ·

Getting timeouts with DSBulk unload

Hi, getting timeouts for dsbulk unload...below is the error

This is the log from the operation:

Operation UNLOAD_20210415-150147-458640 failed: [/123.123.123.123:9042] Connection has been closed.

And this is in the operation.log

2021-04-15 11:04:17 ERROR Operation UNLOAD_20210415-090305-819663 failed: [/123.123.123.123:9042] Connection has been closed.
com.datastax.driver.core.exceptions.TransportException: [/123.123.123.123:9042] Connection has been closed

Would it help to increase driver.socket.readTimeout to resolve this issue? or increase the executor.maxPerSecond = 2500 to say 5000 to make it run faster. Please advise on these parameters or if any other parameter should be changed to get by this error.

Here are the current parameters used -

nohup /apps/dsbulk-1.3.3/bin/dsbulk unload -h <<listOfIPs>> -maxErrors 1000 -u username -p password -f /apps/dsbulk-1.3.3/conf/unload_details.conf &
dsbulk {
  connector.name = "json"
  connector.json.url = "./opfile"
  connector.json.fileNameFormat = "opfile-output-%0,6d.json"
  connector.json.maxRecords = 10000
  connector.json.generatorFeatures = { ESCAPE_NON_ASCII: true, QUOTE_FIELD_NAMES: true }
  schema.keyspace = "keyspace"
  schema.table = "tablename"
  executor.maxPerSecond = 2500
  executor.maxInFlight = 50
  executor.continuousPaging.enabled = false
  driver.query.fetchSize = 5000
  driver.policy.maxRetries = 30
  driver.socket.readTimeout = 120000
}
dsbulk
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

tonazeem_142892 avatar image
tonazeem_142892 answered ·

Erick, here are the message from dsbulk log and the system.log file. Looks like its timing out while reading, without increasing system timeout thresholds how can I get the query to run? This dsbulk is run with diff set of parameters than above.


dsbulk command -- 
/home/cassandra/dsbulk-1.8.0/bin/dsbulk unload --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx  --driver.auth.password xxxx --datastax-java-driver.basic.contact-points 123.123.123.123 -query "select count(*) from sometable with where on clustering column and partial pk -- allow filtering"  --connector.name json --driver.protocol.compression LZ4 --connector.json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -url dsbulk --executor.continuousPaging.enabled false --executor.maxpersecond 2500 --driver.socket.timeout 240000

running dsbulk --
cassandra@somehost> utor.maxpersecond 2500 --driver.basic.request.timeout 240000                                                                <
Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | failed | rows/s |      p50ms |      p99ms |     p999ms
    1 |      1 |      0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
cassandra@somehost> cd logs
cassandra@somehost> cd COUNT_20210423-070104-108326/
cassandra@somehost> ls
operation.log  unload-errors.log
cassandra@somehost> cat operation.log
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN  Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO  Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN  Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:55 INFO  Memory usage: used: 212 MB, free: 1,922 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:55 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:55 INFO  Throughput: 0 reads/second
2021-04-23 07:02:55 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO  Final stats:
2021-04-23 07:02:58 INFO  Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:58 INFO  Memory usage: used: 251 MB, free: 1,883 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:58 INFO  Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:58 INFO  Throughput: 0 reads/second
2021-04-23 07:02:58 INFO  Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
cassandra@somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement@1083fef9 [0 values, idempotence: <UNSET>, CL: <UNSET>, serial CL: <UNSET>, timestamp: <UNSET>, timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
        at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage$1(ResultSubscription.java:372)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$700(CqlRequestHandler.java:94)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
        at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
        at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

system.log ----
DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539  MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>, total time 10004 msec, timeout 10000 msec/cross-node
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,540  MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO  [ScheduledTasks:1] 2021-04-23 00:02:38,551  StatusLogger.java:114 -
Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
ReadStage                         1         0     1736872997         0                 0
ContinuousPagingStage             0         0            586         0                 0
RequestResponseStage              0         0     1483193130         0                 0
ReadRepairStage                   0         0        9079516         0                 0
CounterMutationStage              0         0              0         0                 0
MutationStage                     0         0      351841038         0                 0
ViewMutationStage                 0         0              0         0                 0
CommitLogArchiver                 0         0          32961         0                 0
MiscStage                         0         0              0         0                 0
CompactionExecutor                0         0       12034828         0                 0
MemtableReclaimMemory             0         0          68612         0                 0
PendingRangeCalculator            0         0              9         0                 0
AntiCompactionExecutor            0         0              0         0                 0
GossipStage                       0         0       20137208         0                 0
SecondaryIndexManagement          0         0              0         0                 0
HintsDispatcher                   0         0           3798         0                 0
MigrationStage                    0         0              8         0                 0
MemtablePostFlush                 0         0         338955         0                 0
PerDiskMemtableFlushWriter_0         0         0          66297         0                 0
ValidationExecutor                0         0         247600         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0          41757         0                 0
InternalResponseStage             0         0         525242         0                 0
AntiEntropyStage                  0         0         767527         0                 0
CacheCleanupExecutor              0         0              0         0                 0
Native-Transport-Requests         0         0      958717934         0                65
CompactionManager                 0         0
MessagingService                n/a       0/0
Cache Type                     Size                 Capacity               KeysToSave
KeyCache                  104857216                104857600                      all
RowCache                          0                        0                      all

Please advise.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

In my experience, tweaking different settings without knowing the root cause is usually ineffective.

My suggestion is that you go through the logs on the node for which the failure was reported for clues as to why it failed. Once you know the failure, you can formulate the appropriate solution to address it. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.