DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

baid_manish_187433 avatar image
baid_manish_187433 asked ·

DSE Graph 6.8.2 Performance Issues

Hi, We have a usecase that requires relatively - wide partitions with size <10MB. Here is the histogram:

 [csdusr@csdr-1 dse-6.8.2]$ bin/nodetool tablehistograms IAP Account
 IAP/Account histograms
 Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                               (micros)          (micros)           (bytes)                  
 50%             0.00              0.00              0.00              1109                72
 75%             0.00              0.00              0.00              3973               310
 95%             0.00              0.00              0.00             20501              1597
 98%             0.00              0.00              0.00           8409007            545791
 99%             0.00              0.00              0.00           8409007            545791
  • There is a partition with 8.4 MB size and 545K cells.

Simple queries are taking time above 1 sec.

Ex. Here starting with "User" vertex (1) and traversing to Account to get the data takes 2 seconds.

 gremlin> g.V().hasLabel("User").has("tenantId", "Default").has("nativeId","JDOE").inE().hasLabel("Is_Related_To").has("type", "Owner").
 ......1> outV().hasLabel("Account").has("tenantId", "Default").has("appId", within("abc-pqr")).has("nativeType", within("Account")).fold().order(local).by("updateTime", desc).by("entityKey", desc).unfold().limit(5000).
 ......2> project("entity").by(elementMap("appId", "nativeType", "entityGlobalId", "name", "createTime", "updateTime", "description")). fold().profile()
 ==>Traversal Metrics
 Step                                                               Count  Traversers       Time (ms)    % Dur
 =============================================================================================================
 __.V().hasLabel("User").has("nativeId","ARIHANT...                     1           1          41.521     2.06
   CQL statements ordered by overall duration                                                  18.636
     \_1=SELECT * FROM "IAP"."User" WHERE solr_query = '{"q":"*:*", "fq":["tenantId:Default","nativeId:ARIHANT
         B"]}' LIMIT 2147483647 / Duration: 18 ms / Count: 1
 HasStep([~label.eq(User), tenantId.eq(Default),...                     1           1           0.224     0.01
 __.inE().has("type","Owner").hasLabel("Is_Relat...                  5415        5415          26.549     1.32
   CQL statements ordered by overall duration                                                   2.436
     \_1=SELECT * FROM "IAP"."Account_Actor_Inverse" WHERE "Account_appId" IN ? AND "Account_nativeType" IN ? 
         AND "Account_tenantId" = ? AND "User_appId" = ? AND "User_entityKey" = ? AND "User_nativeType" = ? AN
         D "User_tenantId" = ? AND type = ? / Duration: 2 ms / Count: 1 / Index type: Materialized view
 HasStep([type.eq(Owner)])                                           5415        5415          22.730     1.13
 __.outV().hasLabel("Account").has("appId",P.wit...                  5415        5415        **1758.415    87.12**
   CQL statements ordered by overall duration                                               48204.048
     \_1=SELECT * FROM "IAP"."Account" WHERE "appId" = ? AND "nativeType" = ? AND "tenantId" = ? AND "entityKe
         y" = ? / Duration: 48204 ms / Count: 5415 / Index type: Table: Account
 HasStep([~label.eq(Account), tenantId.eq(Defaul...                  5415        5415          29.600     1.47
 FoldStep                                                               1           1          11.180     0.55
 OrderLocalStep([[value(updateTime), desc], [val...                     1           1          15.521     0.77
 UnfoldStep                                                          5002        5002           6.208     0.31
 RangeGlobalStep(0,5000)                                             5000        5000          11.054     0.55
 ProjectStep([entity],[[CoreElementMapStep([appI...                  5000        5000          59.941     2.97
   CoreElementMapStep([appId, nativeType, entity...                  5000        5000          18.506
 FoldStep                                                               1           1           9.685     0.48
 ReferenceElementStep                                                   1           1          25.634     1.27
                                             >TOTAL                     -           -        2018.269        -

Observations:

Most of the time is taken on the wide-partition vertexes.

While traversing to a vertex, system trying to perform 'n' queries for each incident vertex?

SELECT FROM "IAP"."Account" WHERE "appId" = ? AND "nativeType" = ? AND "tenantId" = ? AND "entityKe y" = ? / Duration: 48204 ms / Count: 5415* / Index type: Table: Account This is completely off compared to RDBMS (postgres) performance with the similar codebase/model we had. Are we missing something? Are there basic fine tuning parameters we should apply?
dsegraph
3 comments
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi @baid_manish_187433,

would you share the schema for the vertices "User" and "Account"?

Thanks!

0 Likes 0 · ·
bharat.asnani_190772 avatar image bharat.asnani_190772 bettina.swynnerton ♦♦ ·

Hi,

The schema is as follows:


schema.vertexLabel('User').ifNotExists().

partitionBy('tenantId', Ascii).

partitionBy('appId', Ascii).

partitionBy('nativeType', Ascii).

clusterBy('entityKey', Text).

property('nativeId', Text).create()


schema.vertexLabel('User').searchIndex().ifNotExists().

by('nativeId').asString().

waitForIndex(5).

create()


schema.vertexLabel('Account').ifNotExists().

partitionBy('tenantId', Ascii).

partitionBy('appId', Ascii).

partitionBy('nativeType', Ascii).

clusterBy('entityKey', Text).

property('updateTime', Timestamp).

create()


schema.edgeLabel('Is_Related_To').ifNotExists().

from('Account').to('User').

clusterBy('type', Ascii, Asc).

create()

0 Likes 0 · ·

Also there is a materialized view to traverse in opposite direction i.e from User to Account.


schema.edgeLabel('Is_Related_To').from('Account').to('User').

materializedView('Account_Actor_Inverse').ifNotExists().

partitionBy(IN, 'tenantId').

partitionBy(IN, 'entityKey').

partitionBy(IN, 'appId').

partitionBy(IN, 'nativeType').

clusterBy('type', Asc).

clusterBy(OUT, 'tenantId', Asc).

clusterBy(OUT, 'appId', Asc).

clusterBy(OUT, 'nativeType', Asc).

clusterBy(OUT, 'entityKey', Asc).

create()

0 Likes 0 · ·

1 Answer

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @baid_manish_187433,

From looking at this profile again, the has filter after the first edge traversal is taking the most time, 87% of the traversal time.

Note, this stage is taking 1758.415ms, not 48204.048ms. 48204.048ms is the aggregated time of all the 5415 CQL queries at this point. Across these 5415 queries, the average is 9ms.

It's the degree of the starting vertex that leads to the 5415 traversers at this stage.

You know the large partition, correct? So you could trace the CQL query in cqlsh to see how long it takes outside of the context of the traversal, substitute the correct values below:

SELECT * FROM "IAP"."Account" WHERE "appId" = ? AND "nativeType" = ? AND "tenantId" = ? AND "entityKey" = ?

Re tracing: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshTracing.html

Thanks!

4 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Hi, I ran the tracing, here are the results

1. For the complete partition(60k rows)

2. For a single row


It seems like in the graph the same statement takes approximately 9ms while when executing directly in the 'cqlsh' it takes less than 1 ms.

0 Likes 0 · ·

Hi Bettina, Please review update provided by rohansurana2810_190538.

Looks like for every traversal, server looks up the row and time taken is much higher than the corresponding CQL query.

0 Likes 0 · ·

Hi @baid_manish_187433,

as far as I can see the slow has() step after the first edge traversal is not unexpected.

The has() step is fetching the properties of the vertices as they come in, and it is more the number of the traversers at this stage than the 10MB partition that contributes to the overall traversal execution time.

I'll check further what optimisation options we have specifically to 6.8, but in my experience this is the expected behaviour with these internal has() filters (i.e. after first traversing over an edge).

0 Likes 0 · ·

Hi @baid_manish_187433,

since it is difficult to deal with detailed performance questions in this Q&A forum, would you book a slot with our Keep Calm service?

https://www.datastax.com/keepcalm

Keep Calm and Cassandra On by setting up a 30 minute meeting with a technical expert from DataStax.
Also feel free to email keepcalm@datastax.com with questions.

Thanks!

0 Likes 0 · ·