ssnmurtyraju_188715 avatar image
ssnmurtyraju_188715 asked Erick Ramirez edited

Why does the Python driver returns same rows when paging through the ResultSet?


I am doing full scan of Cassandra table which has 75 million records. I am using cassandra python driver to connect to cassandra database and below is my code. While iterating ResultSet, the loop is fetching 100 records per page which is fine but it is fetching same 100 records again and again in repeating way. I am new to Cassandra and appreciate if someone can help me.

statement = SimpleStatement("SELECT * FROM keyspace.tablename", fetch_size=500)
rows = session.execute(statement)
for row in rows:
    print row.sid, row.sname
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez edited

@ssnmurtyraju_188715 In the code you provided, it doesn't appear as though you were creating a "bookmark" of which page you are on. As a result, you are effectively going back to the first page which is why you're iterating over the same rows, over and over.

You need to save the page state (I used the bookmark analogy) so you can go back to the last bookmarked page, for example:

bookmark['paging_state'] = rows.paging_state

The next time you query, you need to include the bookmark to get the next page. For example:

ps = bookmark['paging_state']
results = session.execute(statement, paging_state=ps)

For more information on resuming paged results, see Paging Large Queries. Cheers!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

ssnmurtyraju_188715 avatar image ssnmurtyraju_188715 commented ·

Thanks Erick. Appreciate your respone.

I added bookmark as you suggested, but still getting same 100 records while iterating through pages. Please find updated code below. My objective is migrate a large table(75 millions) from Azure Cosmos DB(Cassandra API) to Azure Cosmos DB(SQL API). I am using stand alone VM to do this through python program. Please suggest for any better approach to do this in either Java/Scala/Python.

result_set = session.execute("SELECT * FROM <ks>.<tn>")
page_state = result_set.paging_state
for row in result_set.current_rows:
result_set = session.execute("SELECT * FROM <ks>.<tn>", paging_state=page_state)
for row in result_set.current_rows:
for i in list_all_results:
  print i.doc_id

0 Likes 0 ·