-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
followup changes to allow unsupported datasets #261
Conversation
[ghstack-poisoned]
ghstack-source-id: ce288a19c67fccd0751c6fd92ae14a161da8bfa3 Pull Request resolved: #261
raise ValueError( | ||
f"Dataset {dataset_name} is not supported. " | ||
f"Supported datasets are: {list(_supported_datasets.keys())}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also error out if users pass both a supported dataset_name and dataset_path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this was a decision discussed long ago. The point is for any dataset, we'd like to offer two ways to use data, one is to download from HF hub, the other is to use local files if a path is provided. In the latter case, user still needs to be clear about the correspondence between dataset name and dataset path. Let's still keep this to make it less error-prone.
1. Remove `build_dataloader_fn` as we only use HF data loading for now. And it helps allow unsupported dataset if user specify a `dataset_path` 2. If user uses an unsupported `dataset` without specifying `dataset_path`, we should still throw. [ghstack-poisoned]
ghstack-source-id: 34b380d251e0a80ac5328fdaeb33a1e488f9c735 Pull Request resolved: #261
ghstack-source-id: 34b380d251e0a80ac5328fdaeb33a1e488f9c735 Pull Request resolved: #261
ghstack-source-id: 34b380d251e0a80ac5328fdaeb33a1e488f9c735 Pull Request resolved: pytorch#261
ghstack-source-id: 34b380d251e0a80ac5328fdaeb33a1e488f9c735 Pull Request resolved: pytorch#261
Stack from ghstack (oldest at bottom):
build_dataloader_fn
as we only use HF data loading for now. And it helps allow unsupported dataset if user specify adataset_path
dataset
without specifyingdataset_path
, we should still throw.