question

mike.cheng avatar image
mike.cheng asked mike.cheng commented

Invalid or unsupported protocol version (76)

We have significant usage for years but this is the first time we observed this error

Caused by: com.datastax.driver.core.exceptions.ProtocolError: An unexpected protocol error occurred on host /10.x.x.x:9042. This is a bug in this library, please report: Invalid or unsupported protocol version (76); supported versions are (3/v3, 4/v4, 5/v5-beta)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:126)

According to this documentation, this error means

Indicates that the contacted host reported a protocol error. Protocol errors indicate that the client triggered a protocol violation (for instance, a QUERY message is sent before a STARTUP one has been sent). Protocol errors should be considered as a bug in the driver and reported as such.

So if this is a bug in the driver, I would like to ask the following questions

1. How can we troubleshoot and help you fix this bug?

2. Any approach or configuration we should use to prevent this error from happening in the future

Should we upgrade the driver or anything suggested?

We are using Cassandra 3.x and datastax driver 3.x and I don't think it's easy to reproduce this error.

java driver
4 comments
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

alexandre.dutra avatar image alexandre.dutra ♦ commented ·

We've seen reports of this already but it's incredibly hard to reproduce. What exact version of Cassandra are you using?

One thing that I noticed is that you are using a 3.x driver version that considers protocol v5 as beta: 5/v5-beta. This will not work with Cassandra 4.0+ and could be the culprit.

For full compatibility with Cassandra 4.0, use driver 3.10+ or better yet, driver 4.8+.

0 Likes 0 ·
mike.cheng avatar image mike.cheng commented ·

@alexandre.dutra thanks for the reply. The exact version we use are as follows

  • Cassandra: 3.11.9
  • Datastax Driver: 3.2.0.2

Since we are not using Cassandra 4.0, not using driver 4.8+ shouldn't matter, correct?


0 Likes 0 ·
alexandre.dutra avatar image alexandre.dutra ♦ mike.cheng commented ·

Indeed it shouldn't matter then.

0 Likes 0 ·
mike.cheng avatar image mike.cheng alexandre.dutra ♦ commented ·

It looks like there is not much we can do. Do you have any plan to fix this bug? Is there any action item on my side to prevent this error?

0 Likes 0 ·

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered mike.cheng commented

This error is hard to track down but in my experience, it's usually caused by a message overflow.

A message frame encoded in CQL native protocol (version 4) looks something like this:

      0         8        16        24        32         40
      +---------+---------+---------+---------+---------+
      | version |  flags  |      stream       | opcode  |
      +---------+---------+---------+---------+---------+
      |                length                 |
      +---------+---------+---------+---------+
      |                                       |
      .            ...  body ...              .
      .                                       .
      .                                       .
      +----------------------------------------

(Source: CQL binary protocol specification version 4)

From the diagram above, the message frame is composed of (a) a header, and (b) the message body. The frame header contains metadata information about the message which includes:

  • version indicating frame is a request or response encoded for the protocol version
  • flags such as compression information or warnings from the server
  • stream ID
  • opcode, for example 0x00 for ERROR or 0x03 for AUTHENTICATE
  • the length of the message

When messages are sent down the wire, they are decoded in sequence so these frames:

    | frame 8 header    | frame 8 body  | frame 9 header    | frame 9 body  |
----+---+---+---+---+---+---------------+---+---+---+---+---+---------------+----
... | v | f | s | o | l |   <payload>   | v | f | s | o | l |   <payload>   | ...
----+---+---+---+---+---+---------------+---+---+---+---+---+---------------+----

get decoded as:

  • Parse header for frame 8 starting with version
  • Parse the rest of the header
  • Process the body of frame 8
  • Parse header for frame 9 starting with version
  • Parse the rest of the header
  • Process the body of frame 9

The problem is when the body of a frame is larger than the maximum message size, it "overflows" into the start of the next frame. When the next frame gets parsed, part of the last frame gets picked up when it is expecting to get the version. But since the version in the header doesn't match any of the valid values, the driver returns an "unsupported protocol version" error because it doesn't recognise the version in the frame header.

I think this happens when reading or writing an unexpectedly large amount of data that message goes beyond the maximum frame size. Without knowing the query + partition that triggered the error, it is very hard for a DBA to determine the cause and therefore fix the offending partition. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

mike.cheng avatar image mike.cheng commented ·

Thank you very mush for the detailed explanation! I will check if it's due to large payload

0 Likes 0 ·