Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

debasis.tcs_69445 avatar image
debasis.tcs_69445 asked ·

What could be the reason cqlsh returns OperationTimedOut errors?

Hi Experts

I am getting below error in both data center same time. I checked all nodes were UP in both data centers and there was no errors in log. I received the errors from all nodes.

Can you please let me know what could be problem suddenly.

cqlsh 192.168.11.16 -u cassandra -p ******
Connection error: ('Unable to connect to any servers', {'192.168.11.16': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)})

Thanks in advance.

cassandrasecurity
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image
Erick Ramirez answered ·

This appears to be related to your other question (#4702) where your application is getting OperationTimedOutException connecting to the nodes.

As I explained in that other post, OperationTimedOutException gets thrown when the driver doesn't get a response from the nodes. The same thing applies here -- cqlsh uses an embedded Python driver to connect to the nodes.

There's some issue with your cluster that's environmental and I don't think it's a Cassandra issue. It seems to me at some point, clients (your application, cqlsh) lose connectivity to the nodes and I think you need to involve your sysadmin and network admin teams to assist you with the investigation.

I think the reboot of the nodes is coincidental. It is probably more likely that a reboot causes stale network connections to get released. When the problem manifests itself, you should take a snapshot of the connections on the nodes using Linux utilities like netstat and lsof for your sysadmins/network admins to review and analyse. Cheers!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I tried cqlsh command in same server where cassandra installed. Do you still think it could be network issue?

0 Likes 0 · ·

Yes, because it can still happen when the TCP connections are maxed out on the server. Again, your issues appear to be environmental so you need to investigate why a local connection between cqlsh and C* isn't working.

0 Likes 0 · ·
smadhavan avatar image
smadhavan answered ·

@debasis.tcs_69445, could you update your original post with C*/DSE version, please?

Also, do you know if the system keyspaces (particularly security) were properly repaired? If not, I would recommend you to run a repair on them using nodetool repair -pr on all nodes and retry connecting via the cqlsh using cqlsh 192.168.11.16 -u cassandra -p MASKED --debug? Let us know what the output of it post doing the above steps.

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for the reply

Cassandra version:-

cqlsh 5.0.1 | Cassandra 3.11.0-E000 | CQL spec 3.4.4 | Native protocol v4
cassandra@cqlsh> desc system_auth;
CREATE KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc100': '3', 'dc200': '3'} AND durable_writes = true;

After rebooting all 9 nodes from one data center, the problem got resolved in another data center too (without restarting any nodes in other data center)

Do we know what could be the reason a restart fixed the issue in other nodes for seperate DC.

0 Likes 0 · ·

A friendly note to let you know that I've converted your post to a comment since it's not an "answer". Cheers!

0 Likes 0 · ·