question

dradacosky_185196 avatar image
dradacosky_185196 asked Erick Ramirez commented

What are the considerations for creating new clusters vs expanding an existing cluster for ODS?

For an enterprise application, an operational data store that integrates data from many sources...when does it make sense to break up an ODS into multiple clusters versus having a single, very large cluster? What are the major considerations and implications?

cassandra
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

@dradacosky_185196 There isn't a right-or-wrong answer, or a one-size-fits-all solution. It really boils down to the business requirements.

If there is no requirement to break up the data into separate clusters, I would personally choose to host an operational data store/data lake use case on a single cluster for operational simplicity. It's easier to monitor/manage/maintain a single cluster for obvious reasons.

Larger clusters are also more resilient since the load is distributed to more nodes. On the other side, smaller clusters are more susceptible to infrastructure outages. Think of a 3-node cluster where one node goes down for whatever reason (hardware failure, operating system crash, power outage) -- you effectively lose a 3rd of the cluster compared to a larger 12-node cluster (for example).

There's also the advantage that a client/service/app with its own dedicated DB cluster might have low throughput requirements so the cluster is not fully utilised and it's better off being hosted in a multi-tenant DB cluster.

It's not all good though. The downside of a multi-tenant, single-cluster setup is that any client/service/app which runs expensive queries or has a demanding access pattern can take its fair share of the resources/bandwidth to the detriment of the other clients/services/applications so it is something to consider.

Hopefully this helps. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

dradacosky_185196 avatar image dradacosky_185196 commented ·

@Erick Ramirez, thanks. I guess a couple other thoughts/considerations I'm thinking should be factored in, though not sure how significant they are...workload type and resiliency/single point of failure.

As for workload type, do things like high-read versus high-write or other workload types drive anything in terms of the cluster organization/setup? Again, understand business requirements/use cases are the main driver but assuming a wide variety of those across a functional area (Payments processing, Multi-channel Marketing, etc.) does type of workload factor in?

And for resiliency, I realize Cassandra is highly fault tolerant...but if the cluster does somehow go down, would have a significant business impact to have a major ODS unavailable. In an enterprise environment, a fairly significant consideration...but maybe there are other approaches to help minimize this risk?

Thanks again for any insights/input on this.

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ dradacosky_185196 commented ·
does type of workload factor in?

High-reads vs high-writes is somewhat less of a concern because the read and write paths are different. This is the reason we recommend using separate disks for data (used by reads) and commitlog (where writes are sent) but there is overlap when it comes to compactions. If you'd like to dig in more, there's discussion about this in the Database internals section of the docs.

but maybe there are other approaches to help minimize this risk?

In my experience, in most cases a cluster outage can be tied back to (a) human error, (b) catastrophic infrastructure failure, or (c) data model design issues, of course barring major bugs which are very few and far between. And just like any database, Cassandra is susceptible to being overloaded because Physics mandates there are only so many reads and writes the disks can handle so you need to size the cluster correctly -- have enough nodes to deal with peak traffic, not average. Cheers!

0 Likes 0 ·