question

aky2k7_174715 avatar image
aky2k7_174715 asked Erick Ramirez edited

Huge Cassandra 3 Read latency Compared To Cassandra 1.2

We are migrating our data from Old Cassandra (1.2) to latest 3.11.6. I have written migration script which will read the data from old Cassandra and will fire queries for the new one (so it will be new insert).

After migration, when I am trying to read data, I am finding huge latency from one of the table in new Cassandra (2.5 seconds) compared with old one (600ms).

The total number of columns in this table is almost 60k. There is no clustering column and a single primary key which is unique for each row.

The requirement is to fetch 40-50 columns for 150 unique flow_id. I am using prepared statement which is fired asynchronously from my application and onceall CompletionFuture is done I am parsing the final result. The fetching data itself taking more than 2.5 seconds. Code to fetch result for unique flow_id


allRows = keyList.stream().map((key) -> {

                    return session.executeAsync(boundStmt.setToken(partitionKeyName, key)
                            .setExecutionProfile(profile).setTracing(isTracingEnabled)).toCompletableFuture();

                }).collect(Collectors.toList()).stream().map(completableFuture -> {
                    AsyncResultSet asyncResultSet;
                    try {

                        //traceIfEnabled(asyncResultSet.getExecutionInfo(), isTracingEnabled);
                        asyncResultSet = completableFuture.get();

                    } catch (InterruptedException | ExecutionException e) {
                        throw new RuntimeGestorException(CassandraErrors.FAILURE, e, "moveForward");
                    }
                    return asyncResultSet.one();
                }).filter(row -> row != null).collect(Collectors.toList());     

The type of query (prepared statement) it is going to use

SELECT src_flow_id,failures,author,authored_at,version_num,description,step_1_time,source,type

,title,steps,url, labels,tags,title_lowercase,ent,user,host,votes,published_time,runs,

views,flow_id FROM flows WHERE token(flow_id) = tokien(?)

where place holder(?) will take the token of the 'key'

Row cache is disabled.

Kindly let me know what i am missing here.

**Details -** 
 Old Partitoner OrderPreserve
 New Partitioner MurMur3


Total number of nodes : 4 
Replication factor : 2

Machine configuration :

 12 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz processor
 48GB of RAM

CF Stats

 Read Count: 738  
  Read Latency: 8.633516260162601 ms   
  Write Count: 57  
  Write Latency: 0.1060701754385965 ms  
Pending Flushes: 0  
    Table: flows  
    SSTable count: 4  
    Space used (live): 2310544421  
    Space used (total): 2310544421  
    Space used by snapshots (total): 0  
    Off heap memory used (total): 595544  
    SSTable Compression Ratio: 0.9655157119293131  
    Number of partitions (estimate): 56942  
    Memtable cell count: 0  
    Memtable data size: 0  
    Memtable off heap memory used: 0  
    Memtable switch count: 0  
    Local read count: 37  
    Local read latency: 10.126 ms  
    Local write count: 0  
    Local write latency: NaN ms  
    Pending flushes: 0  
    Percent repaired: 100.0  
    Bloom filter false positives: 0  
    Bloom filter false ratio: 0.00000  
    Bloom filter space used: 234440  
    Bloom filter off heap memory used: 234408  
    Index summary off heap memory used: 70416  
    Compression metadata off heap memory used: 290720  
    Compacted partition minimum bytes: 51  
    Compacted partition maximum bytes: 43388628  
    Compacted partition mean bytes: 13867  
    Average live cells per slice (last five minutes): 1.0  
    Maximum live cells per slice (last five minutes): 1  
    Average tombstones per slice (last five minutes): 75.75956284153006  
    Maximum tombstones per slice (last five minutes): 372  
    Dropped Mutations: 0  
performance
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@aky2k7_174715 The output you posted shows that the read latency is only 10ms:

    Local read latency: 10.126 ms

so it doesn't line up with your claim that it takes 2.5 seconds.

  1. Can you post the schema for the table so we can see what you're reading?
  2. What's the idea behind tokenising flow_id? Is it because you don't know what the partition keys are?
  3. Do you have tracing enabled while you're scanning the table?
  4. What errors or warnings are reported in the logs of the nodes?

Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

aky2k7_174715 avatar image aky2k7_174715 commented ·

Thanks @Erick Ramirez

I have 150 flow_ids. Instead of using IN clause i am asynchronously firing queries to retrieve details. The overall time is 2.5 sec.

a. The flows table contains almost 67k columns. There are no clustering columns. All the flow_ids are unique

The select query which i am firing asynchrously for 150 flow_ids is

SELECT src_flow_id,failures,author,authored_at,version_num,description, step_1_image_creation_time,source,type,title,steps,url,page_labels,tags,title_lowercase,e nt_id,user_id,host,votes,published_at,runs,views,flow_id FROM quicko.flows WHERE token(flow_id) = token(?)

b. I know the primary key. I used tokens instead of primary key to direct each query to designated node.

c. I had tracing enabled but was generating a lot of logs so i disabled it. If required i can post the details. (it will contain tracing of each query, which is 150 in the above scenario).

d. In logs i was seeing warning about tombstone threshold. I compacted my data after setting gc_grace_sec to 0.

0 Likes 0 ·