Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

dshevchuk avatar image
dshevchuk asked ·

Replication factor for Analytics Solo keyspace

I am installing a small DSE 6.8 cluster with next topology:

  • Datacenter "dc1": 3 Cassandra nodes
  • Datacenter "dc2": 2 Analytics Solo nodes

What replication factor for Analytics keyspaces should I use?


Referring to documentation page Setting the replication factor for analytics keyspaces (1) there is a note:

CAUTION: Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters. DSEFS does not support replication to other datacenters, and the dsefs keyspace only contains metadata, not the data stored in DSEFS. Each DSE Analytics datacenter should have its own DSEFS instance.

From this note I understood that analytics keyspaces should be replicated only to datacenters with Analytics workload. Replicating analytics keyspaces to datacenters with other workload (Transactional, Graph, Search) is wrong and should be avoided. Is it right?

Based on above, my RF should be:

ALTER KEYSPACE analytic_keyspace_name 
WITH REPLICATION = {
   'class': 'NetworkTopologyStrategy', 'dc2': '2'
};

However, at page "Creating a DSE Analytics Solo datacenter" there is an example Creating a DSE Analytics Solo datacenter within an existing DSE cluster (2) with uses similar topology:

  • Datacenter "DC1" - has existing database data
  • Datacenter "DC2" - does not store any data but will perform analytics jobs using the database data from DC1

So, in this example "DC1" has Transactional workload and "DC2" has Analytics workload. Is I am wrong here?

This example says to configure analytics keyspaces to replicate to both datacenters:

ALTER KEYSPACE dse_leases
WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 3 };

This makes me confusing as goes against caution note above (Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters).


Can you explain me this case?

Should I replicate my analytic solo keyspaces to my "dc1" which have only transactional workload?


Links:

1 - https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/analytics/settingReplFactorAnalyticsKeyspaces.html

2 - https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/spark/dseAnalyticsSolo.html#CreatingaDSEAnalyticsSolodatacenterwithinanexistingDSEcluster

replicationanalytics
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

bettina.swynnerton avatar image
bettina.swynnerton answered ·

Hi @dshevchuk,

I agree with you that this is a bit confusing.

As far as I understand the caution, it is meant to make a distinction between the analytics keyspaces "dse_analytics", "dse_leases" and "HiveMetaStore" vs the "dsefs" keyspace, which stores the dsefs metadata.

Out of all these keyspaces, only dsefs must not be replicated to other datacenters.

In the case of the other keyspaces, the replication to other datacenters, even if they run a different workload, should not matter. The data in these keyspaces is only used by the Analytics workload.

What is important is that they are replicated with a replication factor high enough to overcome temporarily unresponsive nodes. For example, "dse_leases" is important for the Spark master election, and if only one replica is available, it can lead to a failure to elect a spark master.

I will follow up regarding the caution on the doc page, but I am quite certain that it applies to the restriction on dsefs replication.

I hope this helps!

2 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks @bettina.swynnerton

Now this looks more clear to me.

0 Likes 0 · ·

Hi @dshevchuk,

I had a quick check with the Analytics team, and yes, this is meant to warn against replicating the dsefs keyspace, which can lead to unwanted behaviour.

Cheers

1 Like 1 · ·
smadhavan avatar image
smadhavan answered ·

@dshevchuk, the example demonstrated in that documentation refers to application/user created keyspace with name mykeyspace and that is only replicated to the DC1 datacenter and not to the DSE Analytics Solo datacenter DC2. We will only be replicating the system generated keyspaces such as dse_analytics, dse_leases, etc., to both the regular DC plus the Analytics Solo DC in that cluster. Hope that clarifies your question!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.