kranthij29_188881 avatar image
kranthij29_188881 asked Erick Ramirez answered

How can I verify if an AsyncReadTimeoutException is related to data corruption?

Hello Team,

Can you suggest is below error related to any corruption in OS/DB level related, how can i idnetify if my table is in good condition.

Table size is 1.4 TB around.

I am not only trying to get count(*), writing some join queries from spark, and getting below error. how can i veirfy if any corruption

WARN 2020-05-27 21:23:03,088 org.apache.spark.scheduler.TaskSetManager: Lost task 104.0 in stage 5.0 (TID 246,, executor 29): Exception during execution of SELECT "defectid", "map", "attr_1" FROM "datamanager"."enlight_locations" WHERE token("id", "defectid") > ? AND token("id", "defectid") <= ? ALLOW FILTERING: An unexpected error occurred server side on / Timed out async read from for file /CASSANDRA_DATA/cassandra/data/datamanager/enlight_locations-53f3c5709ce511eaa3feed119b613836/ba-9099-bti-Data.db, more information on epoll state with FileDescriptor{fd=1083} in the logs.
1 comment
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

saravanan.chinnachamy_185977 avatar image saravanan.chinnachamy_185977 commented ·

Can you please update your question with what version of Cassandra and Spark are you using? What is the architecture (DSE Analytics) ? Please feel free to edit your original question.

0 Likes 0 ·
Erick Ramirez avatar image
Erick Ramirez answered

An AsyncReadTimeoutException isn't due to a corrupt SSTable.

Open-source Apache Cassandra uses the traditional staged event-driven architecture (SEDA) where tasks are executed by threads synchronously. This has worked well for quite some time now but we know that the synchronous nature of the architecture can lead to extremely high context-switching during peak loads which can cause resource contention and degrades the performance of the nodes.

In DSE 6.0, we released a new feature called Advanced Performance. Under the hood is a thread-per-core architecture which significantly improves read and write performance with asynchronous IO -- read and write requests are no longer synchronous to remove thread contention and keeps IO threads processing.

An AsyncReadTimeoutException is thrown when an asynchronous disk read (literally an IO request to the disk) takes too long. Although this can take due to faulty hardware (e.g. unresponsive disk subsystem), it is almost always a result of the data disk being overloaded with read requests.

If you are regularly seeing this exception, I recommend you check the disk IO performance with Linux utilities like iostat. Ideally, your system admin team should be monitoring the servers so you have historical metrics on how the disks are performing over time. It will give you an idea on whether IO requests are maxing out. If this is happening regularly, the solution is to increase the capacity of your cluster by adding more nodes.

For more info, see the blog post DSE Advanced Performance: Apache Cassandra™ Goes Turbo. Cheers!

10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

smadhavan avatar image
smadhavan answered Erick Ramirez commented

@kranthij29_188881, if your goal is to find out the total number of records within a C* table, I would suggest you leverage the DataStax Bulk Loader (aka DSBulk in short). Refer to this counting blog for additional info.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Erick Ramirez avatar image Erick Ramirez ♦♦ commented ·

FWIW performing counts in Spark is a valid use case. :)

0 Likes 0 ·