Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

smadhavan avatar image
smadhavan asked ·

CQLSH COPY command returns an error when working with dates which has more than 10 characters

Consider a table with the following schema, ``` CREATE TABLE test.date_format ( a int PRIMARY KEY, b date )... ``` with the following data, ``` cqlsh:test> select * from test.date_format; a | b ---+------------ 1 | -912915758 2 | -552716722 (2 rows) ``` When we attempt to perform a `COPY TO` operation on that table, we end up with the following error? What are my options if I need to download the data? ``` :1:Failed to import 9 rows: ParseError - Failed to parse -912915758 : time data '-912915758' does not match format '%Y-%m-%d', given up without retries ```
copy
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

smadhavan avatar image
smadhavan answered ·

This error is because of the fact that is explained better in this Apache Cassandra™ CASSANDRA-12360 ticket, which is because of how Python handles the data format: https://docs.python.org/2/library/datetime.html

A way to make it work with the COPY command is to add the following to the .cqlshrc file, which by default is located at ~/.cassandra user's home directory,

[copy]
DATETIMEFORMAT=%Y

Post which our COPY command will work as follows,

cqlsh:test> copy test.date_format to '/path/to/date_format.csv';
Using 7 child processes
Starting copy of test.date_format with columns [a, b].
Processed: 2 rows; Rate:       5 rows/s; Avg. rate:       5 rows/s
2 rows exported to 1 files in 0.211 seconds.
cqlsh:test> exit
$ cat /path/to/date_format.csv
1,-912915758
2,-552716722

Using the above is not recommended and the preferred approach here is to leverage the DataStax Bulk Loader (aka DSBulk) unload which would handle them right to workaround the limitations of the Python datetime type as follows,

$ dsbulk load -k test -t date_format -url test
Operation directory: /path/to/logs/LOAD_20200714-235309-741763
total | failed | rows/s | mb/s | kb/row | p50ms | p99ms | p999ms | batches
    2 |      0 |     18 | 0.00 |   0.01 |  8.92 | 13.11 |  13.11 |    1.00
Operation LOAD_20200714-235309-741763 completed successfully in 0 seconds.
Last processed positions can be found in positions.txt
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.