sandejai avatar image
sandejai asked sandejai commented

Why does the Python driver replace spaces in column names with underscores?

I am using Cassandra python driver. Version details :

[cassandra-driver (3.25.0)] 
[cqlsh 6.8.0 | Cassandra 4.0.4 | CQL spec 3.4.5 | Native protocol v4]

I am querying Cassandra database where I have created few columns which have empty space and special characters such as '%' I manage to do by supplying those columns in double quotes( ex "Mem Util %")

CREATE TABLE my_keyspace.memutils (
    time text PRIMARY KEY,
    "Mem Util %" text,

When I query those column via CQL, I supply the column name with double quotes and I get proper data as well as my result columns are as it is, example :

select "Mem Util %" from my_keyspace.memutils;
 Mem Util %

But when I use Cassandra-python driver, the query is able to get the proper data but it replaces spaces in the column's name with underscore in the result data(ResponseFeature)

Code snippet

from cassandra.cluster import Cluster
from cassandra.policies import DCAwareRoundRobinPolicy
from cassandra.auth import PlainTextAuthProvider
print("Select query--> ", fetch_query)
response_future = self.session.execute(fetch_query);
print('response_future  --> ', response_future[0])


Select query-->  select Time, "Mem Util %" from my_keyspace.memutils;
response_future-->  Row(time='631', Mem_Util='9.177055477')

The result data set also doesn't' have "%" in the column.

Question : Is there any option in session.execute(), to disable above behavior ? I referred doc

python driver
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered sandejai commented

This has been the default behaviour in the Python driver since 2014. This is by design.

There is a limitation in Python's implementation of named tuples where non-alphanumeric characters can lead to errors when retrieving data and displaying results using the named_tuple_factory like you are in the example code you provided. The fix is to sanitise the column names by replacing non-alphanumeric characters with underscores (PYTHON-122).

It isn't possible to disable this behaviour because it doesn't make sense to do so -- why would you make the driver print the column name when you already know what it is? You're already specifying the column name when you query the DB:

Select query-->  select Time, "Mem Util %" from my_keyspace.memutils;

You should instead access the value of the column using its position index:

memutilrow = response_future[0]
time = memutilrow[0]
memutil = memutilrow[1]
print "Time: %s, Mem Util: %s" % (time, memutil)


1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

sandejai avatar image sandejai commented ·

There is very common case making "Driver print the column name".

Suppose you create an API or Library function, that takes columns names( "Time: %s, Mem Util: %s" ) as argument and returns the response as pandas data frame with same column name as "headers".

In this the callers would get a different header than what he supplied as argument, because driver has changed the name without knowledge.

The work around it little bad, I have to segregate columns in two list one without space and one with space.

I make one query to Cassandra for all non spaces columns.

I make separate query for each spaced/Special character columns, when I get response from Cassandra, I replace my original column name in the query response.

Then finally I merge all the data frames and return a data frame which has exactly same column/header names as supplied in method argument/API parameters by caller,

So that caller can retrieve data from the data frame/response/result using exact same column name(Mem Util: %s )

0 Likes 0 ·