Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

mishra.anurag643_153409 avatar image
mishra.anurag643_153409 asked Erick Ramirez answered

How do I get data from Cassandra based on primary key incremental value (pushdown) in PySpark?

I am using pyspark to connect with cassandra to bring the data . I have tried to load whole data from the table , but if I have to bring the partial data based on primary key value , How can I achieve this using pyspark ? Can I write where condition in the data -frame ? would push down work here ? so only partial data will be transferred over network to the spark .

spark-cassandra-connector
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered

I have to admit, I don't understand your question. I don't know what "incremental value" means in "primary key incremental value".

In any case, here's an example that might answer your question. I have this table of users:

CREATE TABLE community.users (
    name text PRIMARY KEY,
    age int,
    colour text
)

which contains:

 name   | age | colour
--------+-----+--------
    bob |  42 | orange
 charli |  13 | yellow
  alice |  29 |    red

Here's a PySpark example where I'm filtering the table using where():

>>> user = spark.read.format("org.apache.spark.sql.cassandra")
             .load(keyspace="community",table="users")

>>> alice = user.where(user.name == 'alice')

And here are the results:

>>> alice.show()
+-----+---+------+
| name|age|colour|
+-----+---+------+
|alice| 29|   red|
+-----+---+------+

Cheers!

Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.