Hello, DataStax Community!

We want to make you aware of a few operational updates which will be carried out on the site. We are working hard to streamline the login process to integrate with other DataStax resources. As such, you will soon be prompted to update your password. Please note that your username will remain the same.

As we work to improve your user experience, please be aware that login to the DataStax Community will be unavailable for a few hours on:

  • Wednesday, July 15 16:00 PDT | 19:00 EDT | 20:00 BRT
  • Thursday, July 16 00:00 BST | 01:00 CEST | 04:30 IST | 07:00 CST | 09:00 AEST

For more info, check out the FAQ page. Thank you for being a valued member of our community.


ssnmurtyraju_188715 avatar image
ssnmurtyraju_188715 asked ·

Why does the Python driver returns same rows when paging through the ResultSet?


I am doing full scan of Cassandra table which has 75 million records. I am using cassandra python driver to connect to cassandra database and below is my code. While iterating ResultSet, the loop is fetching 100 records per page which is fine but it is fetching same 100 records again and again in repeating way. I am new to Cassandra and appreciate if someone can help me.

statement = SimpleStatement("SELECT * FROM keyspace.tablename", fetch_size=500)
rows = session.execute(statement)
for row in rows:
    print row.sid, row.sname
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@ssnmurtyraju_188715 In the code you provided, it doesn't appear as though you were creating a "bookmark" of which page you are on. As a result, you are effectively going back to the first page which is why you're iterating over the same rows, over and over.

You need to save the page state (I used the bookmark analogy) so you can go back to the last bookmarked page, for example:

bookmark['paging_state'] = rows.paging_state

The next time you query, you need to include the bookmark to get the next page. For example:

ps = bookmark['paging_state']
results = session.execute(statement, paging_state=ps)

For more information on resuming paged results, see Paging Large Queries. Cheers!

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Erick. Appreciate your respone.

I added bookmark as you suggested, but still getting same 100 records while iterating through pages. Please find updated code below. My objective is migrate a large table(75 millions) from Azure Cosmos DB(Cassandra API) to Azure Cosmos DB(SQL API). I am using stand alone VM to do this through python program. Please suggest for any better approach to do this in either Java/Scala/Python.

result_set = session.execute("SELECT * FROM <ks>.<tn>")
page_state = result_set.paging_state
for row in result_set.current_rows:
result_set = session.execute("SELECT * FROM <ks>.<tn>", paging_state=page_state)
for row in result_set.current_rows:
for i in list_all_results:
  print i.doc_id

0 Likes 0 · ·