Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read the partitions in a guaranteed deterministic order #23

Merged
merged 2 commits into from
Mar 15, 2022

Conversation

DamianBarabonkovQC
Copy link
Contributor

@DamianBarabonkovQC DamianBarabonkovQC commented Mar 15, 2022

Description:

Occasionally when reading a dataset, its partition data is read in a non-deterministic order between Python runs. This ultimately arises because the partition information in DatasetMetadata are stored in a set. Then when they are iterated, the order in which they are read is not guaranteed. This is part of the python specification.

Simply sorting after the set has been read in order to get the partitions in a deterministic order fixes the issue downstream where partition data is read non-deterministically.

  • Closes #xxxx
  • Changelog entry

@DamianBarabonkovQC DamianBarabonkovQC marked this pull request as ready for review March 15, 2022 15:33
@xhochy xhochy merged commit 25d4390 into main Mar 15, 2022
@xhochy xhochy deleted the deterministic_partition_order branch March 15, 2022 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants