Overhaul bulk api and allow paging of bulk results #77

myrho · 2022-07-04T09:29:17Z

Currently only the first num_pages pages can be fetched per bulked request.
Add page handle that encodes page state per bulked request.

The text was updated successfully, but these errors were encountered:

soad003 · 2023-06-13T14:34:33Z

This is related to #92.

Some other problems that need consideration when designing a new bulk interface are:

Currently we work with streaming responses. Unfortunately, errors that happen in the queries can not be properly be propagated since the request header is already written to the wire. Our current solution with error field is problematic for users to handle.
many bulk api calls are heavy and as suspect to query timeouts, this should somehow be handled
many bulk apis produce a lot of lot on the db since at the moment all requests are fired on the db in an async manner without any limitation. This can overwhelm even a big cassandra instance.
the bulk api is kind of schema-less. It gets the values to return as python dicts and infers the structure from there. It auto flattens lists and complex datatypes. We had some issues with records that have optional fields because of that since the header of the resulting csv is inferred from the first result written to the line, which leads to nondeterministic/ timing dependent results and errors.

myrho self-assigned this Jul 4, 2022

soad003 changed the title ~~Allow paging of bulk results~~ Overhaul bulk api and allow paging of bulk results Jun 13, 2023

soad003 self-assigned this Jun 13, 2023

Provide feedback