DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

rob.hofmann_165408 avatar image
rob.hofmann_165408 asked ·

DataStax C# Driver doesnt output special characters

Hi! i’ve got a question regarding the C# driver. I’ve been trying to build an export tool for Cassandra based on your C# driver. I’m encountering an issue where, when tables have data with special characters, I dont receive these information within C#. Am I doing something wrong here, or is this a bug?


I’ve tried this on different environments (native cassandra on linux centos 7, docker setup on a windows vm). Both give the same result.


This is the data i put in cassandra:

t*\x10r>\nw@S=


This is what i get out:

t*x10r>nw@S=

Is there a way to get the backslashes as well?


EDIT: When I use the COPY command in cqlsh it does output the special characters.

driverc#special characters
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

joao.reis avatar image
joao.reis answered ·

I was able to reproduce this only by using the COPY FROM command. It's not an issue with the driver because if you try to query the data with any other tool you will see that they all return the data without the backslashes so it's the actual server that is returning that data. The issue is that COPY FROM uses the backslash as an escape character so you can do one of two things:

  • Change the escape character to one that you are sure that it will not be used:
COPY FROM .... WITH ESCAPE ='|';

to use | as the escape character for example.

  • Convert the csv file in order to escape the escape character:
t*\\x10r>\\nw@S= 

instead of

t*\x10r>\nw@S=
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rob.hofmann_165408 avatar image
rob.hofmann_165408 answered ·

Yes you are correct. I just pushed a commit to my GitHub containing the replacing of 1 backslash to 2 backslashes when writing to CSV. This fixes the issue. I get exactly the same output as my input when i copy data in and export it with my tool.


Thanks!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

joao.reis avatar image
joao.reis answered ·

If I add a backslash to all the backslashes in your csv I get the expected data. Are you seeing the debug view on Visual Studio? Visual Studio shows escaped strings in debug mode, you can click the magnifying glass icon to show the plain unescaped string.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rob.hofmann_165408 avatar image
rob.hofmann_165408 answered ·

Thats what I was thinking as well. However, when I change the single backslash to double backslashes as you suggested, i'm actually seeing two backslashes in my database. So I either get 0 or 2 backslashes. The real question is: how do I get just 1.


Also the suggestion you are making with using another escape character is going to be a challenge, since we can't predict which characters are used in our data. So the same issue will probably occur with another character then.


Looking forward to your thoughts.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rob.hofmann_165408 avatar image
rob.hofmann_165408 answered ·

Hi,


Thanks for your help so far. With your code i'm able to get satisfying results. However i'm working with existing tables. This is how to reproduce:


First of all create this keyspace & table:

CREATE KEYSPACE stresscql WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE stresscql.insanitytest (
    mailboxid text,
    alias text,
    PRIMARY KEY (mailboxid, alias)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (alias ASC)
    AND bloom_filter_fp_chance = 0.1
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = 'A table of many types to test wide rows'
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';


Next login to cqlsh and use this command:

COPY stresscql.insanitytest FROM '/myfolder/testdata.csv';

testdata.csv.zip


Note: when i change the single backslashes in the testdata.csv to double backslashes, I get double backslashes in my code while debugging. But when I have single backslashes (like the sample attached) I get zero backslashes in my code while debugging.


Next up: use my tool (which has the code i provided earlier) from https://github.com/RobHofmann/Cassandra.BackupAndRestore/. Make sure to edit appsettings.json with your values.


PS. I use a powershell script to spin up a 6 node cluster in Docker. You can find that script & info here: https://github.com/RobHofmann/Cassandra.LocalDockerCluster


Looking forward to your input!


testdatacsv.zip (926 B)
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

joao.reis avatar image
joao.reis answered ·

I'm still not able to reproduce the issue with the code sample that you provided. Which driver and server versions are you using? I tested this with DataStax C# Driver for Apache Cassandra 3.11.0 and Apache Cassandra 3.11.4.

I inserted this row into the database ('cn' is of type 'text'):

INSERT INTO users (isdeleted,id,firstname,cn,lastname) VALUES (false,be8a976f-0616-405b-8d79-8d85915c4889,'3213','t*\x10r>\nw@S=','55');

And this is the resulting .csv file:

False,be8a976f-0616-405b-8d79-8d85915c4889,"3213","t*\x10r>\nw@S=","55"
Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rob.hofmann_165408 avatar image
rob.hofmann_165408 answered ·

This is the method i use for fetching rows:

public Task<RowSet> GetRowsAsync(ISession session, string table)
{
return session.ExecuteAsync(new SimpleStatement($"SELECT * FROM {table}"));
}

This is the main code:

_logger.Log("Fetching rows", fullTableName);
var dataRows = await _cassandraService.GetRowsAsync(session, fullTableName);
_logger.Log("Fetched rows", fullTableName);

_logger.Log("Start writing data to CSV", fullTableName);
TextWriter tw = new StreamWriter($"{_cassandraConfiguration.BackupTargetFolder}/{fullTableName}.csv");
_counter.Reset();
foreach (var dataRow in dataRows)
{
var stringRow = new string[dataRows.Columns.Length];
for (int i = 0; i < dataRows.Columns.Length; i++)
{
var data = dataRow.GetValue<object>(dataRows.Columns[i].Name);
if (data == null)
{
stringRow[i] = "";
continue;
}

if (data is DateTimeOffset)
stringRow[i] = ((DateTimeOffset)data).ToString("yyyy-MM-dd HH:mm:ss.fff+0000");
else if (data is string)
stringRow[i] = $"\"{data}\"";
else
stringRow[i] = data.ToString();
}
tw.WriteLine(string.Join(",", stringRow));
_counter.IncrementCounter();
}

tw.Flush();
tw.Close();


Let me know if something is unclear.

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

joao.reis avatar image
joao.reis answered ·

Hi, I'm not able to reproduce this, can you provide a code sample?

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.