feat(python): support DataFrame
init from generators
#5424
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Expands on #5411, adding support for initialising
DataFrames
from generators (in either orientation). Can even init from a generator of generators now, if that floats your boat ;)It's a bit more complex than the previous
Series
update, as you don't necessarily know in advance how many columns you have; if that can't be inferred (eg: no 'columns' param) then the initial chunk is limited to 1000 rows - all subsequent updates can then target taking chunks of 1,000,000 elements, which initial testing shows seems to be reasonable, but we can probably optimise further based on the underlying/inferred schema.Also:
PyDataFrame.read_rows
make better use of the relatively new schema override param, to avoid post-init casts.Series
init from generators that return no data.