question

rzilkha_129571 avatar image
rzilkha_129571 asked Erick Ramirez edited

Spark connector or Splittable hadoop-sstable files, when dealing with single instance

Hi,

I have a unique case where I have to process a lot of data from a single non-cluster Cassandra instance and since we're talking about a non-cluster mode, If I can have a more efficient way to do the batch process by loading sstables to spark workers, using hadoop-sstable library instead of using the connector, which I assume also open connections to the database.

thoughts?

by the way, anyone tried using https://github.com/fullcontact/hadoop-sstable with spark

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

@rzilkha_129571 unfortunately, there isn't such a thing. And with regards to hadoop-sstable, that is a very old implementation that only support C* 2.0. The SSTable format is completely different for C* 2.1 (EOL) or C* 2.2 and the storage engine has been completely refactored in C* 3.x.

If you need the C* reads to be more performant, our recommendation is to increase the capacity of your cluster by adding more nodes and scaling out. This method is how you can achieve the throughput you require. Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.