Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partitioning and group size control in coco dataset generation. #175

Merged
merged 2 commits into from
Sep 19, 2022

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Sep 19, 2022

Allow parse_coco.py to specify row group size and max rows per file to control the layout of the coco dataset in terms of exposing more parallelisms for the reader.

@eddyxu
Copy link
Contributor Author

eddyxu commented Sep 19, 2022

This is part of #163

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small nit (non-blocking)

@@ -39,7 +39,7 @@

def read_file(uri) -> bytes:
if not urlparse(uri).scheme:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you allowed to use ~ substitution in file:// uri's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this way we can use "train.py ~/dataset".

@eddyxu eddyxu merged commit 72c7443 into main Sep 19, 2022
@eddyxu eddyxu deleted the lei/coco_partition branch September 19, 2022 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants