Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul bulk api and allow paging of bulk results #77

Open
myrho opened this issue Jul 4, 2022 · 1 comment
Open

Overhaul bulk api and allow paging of bulk results #77

myrho opened this issue Jul 4, 2022 · 1 comment
Assignees

Comments

@myrho
Copy link
Contributor

myrho commented Jul 4, 2022

Currently only the first num_pages pages can be fetched per bulked request.
Add page handle that encodes page state per bulked request.

@myrho myrho self-assigned this Jul 4, 2022
@soad003 soad003 changed the title Allow paging of bulk results Overhaul bulk api and allow paging of bulk results Jun 13, 2023
@soad003
Copy link
Member

soad003 commented Jun 13, 2023

This is related to #92.

Some other problems that need consideration when designing a new bulk interface are:

  • Currently we work with streaming responses. Unfortunately, errors that happen in the queries can not be properly be propagated since the request header is already written to the wire. Our current solution with error field is problematic for users to handle.
  • many bulk api calls are heavy and as suspect to query timeouts, this should somehow be handled
  • many bulk apis produce a lot of lot on the db since at the moment all requests are fired on the db in an async manner without any limitation. This can overwhelm even a big cassandra instance.
  • the bulk api is kind of schema-less. It gets the values to return as python dicts and infers the structure from there. It auto flattens lists and complex datatypes. We had some issues with records that have optional fields because of that since the header of the resulting csv is inferred from the first result written to the line, which leads to nondeterministic/ timing dependent results and errors.

@soad003 soad003 self-assigned this Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants