Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

debasis.tcs_69445 avatar image
debasis.tcs_69445 asked ·

Driver returned a NoHostAvailableException after getting OperationTimedOutException from nodes

We are getting below error from application

2020-05-27 00:17:45,915 | ERROR | | Consolidator-root | SCSDTTN01 | qtp25558514-131 | h.e.m.c.r.p.RuntimeExceptionMapper | {} | Internal server error com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: 192.168.11.24/192.168.11.24:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.24/192.168.11.24:9042] Timed out waiting for server response), 192.168.11.16/192.168.11.16:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.16/192.168.11.16:9042] Timed out waiting for server response), 192.168.11.19/192.168.11.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.19/192.168.11.19:9042] Timed out waiting for server response), 192.168.11.21/192.168.11.21:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
        at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:83)
        at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
        at hr.ericsson.m2mse.jobs.dao.cassandraJob.CassandraJobDao.getJobsByStatusAndCreationTime(CassandraJobDao.java:644)
        at hr.ericsson.m2mse.jobs.business.impl.JobsServiceImpl.getAllJobs(JobsServiceImpl.java:171)
        at hr.ericsson.m2mse.jobs.rest.resource.impl.PurgeImpl.getPurgeJobs(PurgeImpl.java:94)
        at sun.reflect.GeneratedMethodAccessor221.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:180)
        at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
        at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:189)
        at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
        at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
        at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:254)
        at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234)
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208)
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160)
        at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:180)
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:299)
        at hr.ericsson.m2mse.jobs.rest.resource.impl.JobsServlet.handleRequest(JobsServlet.java:87)
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:223)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:274)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
        at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:201)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:72)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:287)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:80)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:499)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:748)Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: 192.168.11.24/192.168.11.24:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.24/192.168.11.24:9042] Timed out waiting for server response), 192.168.11.16/192.168.11.16:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.16/192.168.11.16:9042] Timed out waiting for server response), 192.168.11.19/192.168.11.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.19/192.168.11.19:9042] Timed out waiting for server response), 192.168.11.21/192.168.11.21:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
        at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:265)
        at com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:62)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:357)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.retry(RequestHandler.java:539)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.processRetryDecision(RequestHandler.java:521)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:962)
        at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1575)
        at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
        at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
        at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 common frames omitted

During that time I did not find any problem in system.log

driverjava
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

The OperationTimedOutException gets thrown when the driver doesn't get a response from the nodes for a read request within SocketOptions.getReadTimeoutMillis(). The error message itself explicitly states this:

Timed out waiting for server response

This is a client-side timeout meaning that the driver was waiting for a response but didn't get anything back from the node. It is different from a ReadTimeoutException which is a server-side timeout for queries which took longer than read_request_timeout_in_ms.

The OperationTimedOutException can happen when nodes are completely unresponsive usually because they are overloaded and have long GC pauses. But the driver can also get this timeout if the network latency is high or if there is some other network issue that prevents it from connecting to the cluster.

Based on the log entry you posted, several nodes were unresponsive/uncontactable:

... \
192.168.11.24/192.168.11.24:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.24/192.168.11.24:9042] Timed out waiting for server response), \
192.168.11.16/192.168.11.16:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.16/192.168.11.16:9042] Timed out waiting for server response), \
192.168.11.19/192.168.11.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [192.168.11.19/192.168.11.19:9042] Timed out waiting for server response), \
...

The log entries also indicate that the driver tried to connect to all hosts but they all timed out. When a node is unresponsive or unreachable, the driver temporarily blacklists the node. But the problem is that when all the nodes get blacklisted because they're all unreachable, the driver gives up trying because there aren't any nodes left to try:

        at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:265)

and that's what leads to the NoHostAvailableException:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed ( ... )

Check the nodes for signs of overload. Otherwise, you need to escalate the issue to your network administrators to determine why the driver/app cannot contact the cluster.

For more information, see the Java driver manual. Cheers!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for the detailed answer. I checked sar report and found some NW drop in that time. Can you please share your thought

12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
12:30:01 AM ens161 0.00 0.00 0.00 0.17 0.00 0.00 0.00 0.00 0.00
12:30:01 AM ens192 0.00 0.00 0.00  0.17 0.00 0.00 0.00 0.00 0.00
12:30:01 AM ens224 0.00 0.00 0.00 0.17 0.00 0.00 0.00 0.00 0.00
12:30:01 AM ens193 0.00 0.00 0.00  0.17  0.00 0.00 0.00 0.00 0.00
12:30:01 AM ens225 0.00 0.00 0.00  0.17 0.00 0.00 0.00 0.00
0 Likes 0 · ·

I'm not sure what you're expecting from me but if you're saying that you've got evidence that there was a network connectivity issue then it would explain why the driver couldn't connect to the cluster. Cheers!

0 Likes 0 · ·
debasis.tcs_69445 avatar image debasis.tcs_69445 Erick Ramirez ♦♦ ·

Also I see in that time snapshot backup as well tar command to zip them was running in all nodes. Could it be a problem due to that? because application was running heavy insert to cassandra nodes in that time.

0 Likes 0 · ·
Show more comments