DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

smadhavan avatar image
smadhavan asked ·

CQLSH COPY command returns an error when working with dates which has more than 10 characters

Consider a table with the following schema,

CREATE TABLE test.date_format (
    a int PRIMARY KEY,
    b date
)...

with the following data,

cqlsh:test> select * from test.date_format;

 a | b
---+------------
 1 | -912915758
 2 | -552716722

(2 rows)

When we attempt to perform a COPY TO operation on that table, we end up with the following error? What are my options if I need to download the data?

:1:Failed to import 9 rows: ParseError - Failed to parse -912915758 : time data '-912915758' does not match format '%Y-%m-%d', given up without retries
copydata
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

smadhavan avatar image
smadhavan answered ·

This error is because of the fact that is explained better in this Apache Cassandra™ CASSANDRA-12360 ticket, which is because of how Python handles the data format: https://docs.python.org/2/library/datetime.html

A way to make it work with the COPY command is to add the following to the .cqlshrc file, which by default is located at ~/.cassandra user's home directory,

[copy]
DATETIMEFORMAT=%Y

Post which our COPY command will work as follows,

cqlsh:test> copy test.date_format to '/path/to/date_format.csv';
Using 7 child processes
Starting copy of test.date_format with columns [a, b].
Processed: 2 rows; Rate:       5 rows/s; Avg. rate:       5 rows/s
2 rows exported to 1 files in 0.211 seconds.
cqlsh:test> exit
$ cat /path/to/date_format.csv
1,-912915758
2,-552716722

Using the above is not recommended and the preferred approach here is to leverage the DataStax Bulk Loader (aka DSBulk) unload which would handle them right to workaround the limitations of the Python datetime type as follows,

$ dsbulk load -k test -t date_format -url test
Operation directory: /path/to/logs/LOAD_20200714-235309-741763
total | failed | rows/s | mb/s | kb/row | p50ms | p99ms | p999ms | batches
    2 |      0 |     18 | 0.00 |   0.01 |  8.92 | 13.11 |  13.11 |    1.00
Operation LOAD_20200714-235309-741763 completed successfully in 0 seconds.
Last processed positions can be found in positions.txt
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.