DataStax Academy FAQ

DataStax Academy migrated to a new learning management system (LMS) in July 2020. We are also moving to a new Cassandra Certification process so there are changes to exam bookings, voucher system and issuing of certificates.

Check out the Academy FAQ pages for answers to your questions:


question

rzilkha_129571 avatar image
rzilkha_129571 asked ·

Spark connector or Splittable hadoop-sstable files, when dealing with single instance

Hi,

I have a unique case where I have to process a lot of data from a single non-cluster Cassandra instance and since we're talking about a non-cluster mode, If I can have a more efficient way to do the batch process by loading sstables to spark workers, using hadoop-sstable library instead of using the connector, which I assume also open connections to the database.

thoughts?

by the way, anyone tried using https://github.com/fullcontact/hadoop-sstable with spark

sparkspark-connectorsstable
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@rzilkha_129571 unfortunately, there isn't such a thing. And with regards to hadoop-sstable, that is a very old implementation that only support C* 2.0. The SSTable format is completely different for C* 2.1 (EOL) or C* 2.2 and the storage engine has been completely refactored in C* 3.x.

If you need the C* reads to be more performant, our recommendation is to increase the capacity of your cluster by adding more nodes and scaling out. This method is how you can achieve the throughput you require. Cheers!

Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.