question

rob.hofmann_165408 avatar image
rob.hofmann_165408 asked rob.hofmann_165408 commented

DataStax C# Driver doesnt output special characters

Hi! i’ve got a question regarding the C# driver. I’ve been trying to build an export tool for Cassandra based on your C# driver. I’m encountering an issue where, when tables have data with special characters, I dont receive these information within C#. Am I doing something wrong here, or is this a bug?

I’ve tried this on different environments (native cassandra on linux centos 7, docker setup on a windows vm). Both give the same result.

This is the data i put in cassandra:

t*\x10r>\nw@S=

This is what i get out:

t*x10r>nw@S=

Is there a way to get the backslashes as well?

EDIT: When I use the COPY command in cqlsh it does output the special characters.

[UPDATE] This is the method i use for fetching rows:

  
                 
  1. public Task<RowSet> GetRowsAsync(ISession session, string table)
  2. {
  3. return session.ExecuteAsync(new SimpleStatement($"SELECT * FROM {table}"));
  4. }

This is the main code:

  
                 
  1. _logger.Log("Fetching rows", fullTableName);
  2. var dataRows = await _cassandraService.GetRowsAsync(session, fullTableName);
  3. _logger.Log("Fetched rows", fullTableName);
  4.  
  5. _logger.Log("Start writing data to CSV", fullTableName);
  6. TextWriter tw = new StreamWriter($"{_cassandraConfiguration.BackupTargetFolder}/{fullTableName}.csv");
  7. _counter.Reset();
  8. foreach (var dataRow in dataRows)
  9. {
  10. var stringRow = new string[dataRows.Columns.Length];
  11. for (int i = 0; i < dataRows.Columns.Length; i++)
  12. {
  13. var data = dataRow.GetValue<object>(dataRows.Columns[i].Name);
  14. if (data == null)
  15. {
  16. stringRow[i] = "";
  17. continue;
  18. }
  19.  
  20. if (data is DateTimeOffset)
  21. stringRow[i] = ((DateTimeOffset)data).ToString("yyyy-MM-dd HH:mm:ss.fff+0000");
  22. else if (data is string)
  23. stringRow[i] = $"\"{data}\"";
  24. else
  25. stringRow[i] = data.ToString();
  26. }
  27. tw.WriteLine(string.Join(",", stringRow));
  28. _counter.IncrementCounter();
  29. }
  30.  
  31. tw.Flush();
  32. tw.Close();

Let me know if something is unclear.

csharp driver
3 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

joao.reis avatar image joao.reis commented ·

Hi, I'm not able to reproduce this, can you provide a code sample?

0 Likes 0 ·
rob.hofmann_165408 avatar image rob.hofmann_165408 commented ·

[Update posted in the original question body]

0 Likes 0 ·
joao.reis avatar image joao.reis commented ·

I'm still not able to reproduce the issue with the code sample that you provided. Which driver and server versions are you using? I tested this with DataStax C# Driver for Apache Cassandra 3.11.0 and Apache Cassandra 3.11.4.

I inserted this row into the database ('cn' is of type 'text'):

INSERT INTO users (isdeleted,id,firstname,cn,lastname) VALUES (false,be8a976f-0616-405b-8d79-8d85915c4889,'3213','t*\x10r>\nw@S=','55');

And this is the resulting .csv file:

False,be8a976f-0616-405b-8d79-8d85915c4889,"3213","t*\x10r>\nw@S=","55"
0 Likes 0 ·

1 Answer

joao.reis avatar image
joao.reis answered rob.hofmann_165408 commented

I was able to reproduce this only by using the COPY FROM command. It's not an issue with the driver because if you try to query the data with any other tool you will see that they all return the data without the backslashes so it's the actual server that is returning that data. The issue is that COPY FROM uses the backslash as an escape character so you can do one of two things:

  • Change the escape character to one that you are sure that it will not be used:
COPY FROM .... WITH ESCAPE ='|';

to use | as the escape character for example.

  • Convert the csv file in order to escape the escape character:
t*\\x10r>\\nw@S= 

instead of

t*\x10r>\nw@S=
4 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

rob.hofmann_165408 avatar image rob.hofmann_165408 commented ·

Thanks for your help so far. With your code i'm able to get satisfying results. However i'm working with existing tables. This is how to reproduce:

CREATE TABLE stresscql.insanitytest (
    mailboxid text,
    alias text,
    PRIMARY KEY (mailboxid, alias)
)

Next login to cqlsh and use this command:

COPY stresscql.insanitytest FROM '/myfolder/testdata.csv';

Note: when i change the single backslashes in the testdata.csv to double backslashes, I get double backslashes in my code while debugging. But when I have single backslashes (like the sample attached) I get zero backslashes in my code while debugging.

Next up: use my tool (which has the code i provided earlier) from https://github.com/RobHofmann/Cassandra.BackupAndRestore/. Make sure to edit appsettings.json with your values.

PS. I use a powershell script to spin up a 6 node cluster in Docker. You can find that script & info here: https://github.com/RobHofmann/Cassandra.LocalDockerCluster

Looking forward to your input!

0 Likes 0 ·
testdatacsv.zip (926 B)
rob.hofmann_165408 avatar image rob.hofmann_165408 commented ·

Thats what I was thinking as well. However, when I change the single backslash to double backslashes as you suggested, i'm actually seeing two backslashes in my database. So I either get 0 or 2 backslashes. The real question is: how do I get just 1.


Also the suggestion you are making with using another escape character is going to be a challenge, since we can't predict which characters are used in our data. So the same issue will probably occur with another character then.


Looking forward to your thoughts.

0 Likes 0 ·
joao.reis avatar image joao.reis commented ·

If I add a backslash to all the backslashes in your csv I get the expected data. Are you seeing the debug view on Visual Studio? Visual Studio shows escaped strings in debug mode, you can click the magnifying glass icon to show the plain unescaped string.

0 Likes 0 ·
rob.hofmann_165408 avatar image rob.hofmann_165408 commented ·

Yes you are correct. I just pushed a commit to my GitHub containing the replacing of 1 backslash to 2 backslashes when writing to CSV. This fixes the issue. I get exactly the same output as my input when i copy data in and export it with my tool.


Thanks!

0 Likes 0 ·