Last week we deployed code to force a consistency level of LOCAL_QUORUM on all queries. We are seeing errors (see ‘sample-app-log-errors.txt’ file) pop up in production talking about read failures at consistency level of ALL. Our problem is that the CL of ALL is not defined anywhere in our codebase. I ran our application opening TRACE logging for com.datastax.driver.core.Connection to get us query logging. These queries can be seen in the ‘sample-dse-java-driver-logging.txt’ file. Throughout this whole file, the consistency levels defined for each query are LOCAL_ONE or LOCAL_QUORUM, no ALLs. We do use prepared statements so not all of the full queries are displayed, only the ID from the server side. I have pasted 3 “full cycles” (see below paragraph for explanation on app process) which shows the consistency levels for a full app cycle multiple times. So based on this, the following question: I thought the statement consistency level set in the driver should be the CL used, so where is ‘ALL’ coming from in the error message?
We store data files from a data system. Each ‘row’ has a corresponding hash in the eaa.keyshash table with it’s measurementId. This table is used for us to tell if we need to merge rows. If we find a row hash that matches a previous row, we pull this ID then that gets us all of our partition keys to do an eaa.measurement read query to pull a row. We then merge parameter data for our new row into our old row, then update that row. It is possible, in a large data file, for multiple rows to need to update previously inserted rows, so a row stored previously from this file may be getting updated. These files can contain hundred of thousands of rows, each performing those three steps above. The keyshash query is a LWT, if the row gets inserted then there is no update to perform and we straight insert the measurement row. If that LWT fails, it gives us our measurementId that we use for the update. All of the statements use LOCAL_QUORUM consistency. We do use some LOCAL_ONE consistency levels for determining what files to run, but that query doesn’t run as frequently as the others that use LOCAL_QUORUM.
In the application, we do have a retry utility. This retry is very simple, if the query fails it just waits for X amount of time and retries on the same consistency level, and only retries 3 times. We see the above error after a file has failed 3 times. What is confusing is that no where in our application do we define a consistency level of ALL, only LOCAL_QUORUM.