-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column Selection: Improvements #366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally well-structured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just left two very minor suggestions! Feel free to merge after making these changes.
if "train" not in dataset: | ||
raise ValueError("The dataset must contain a `train` split.") | ||
|
||
columns_mapping = { | ||
col: col.replace(".", "_") for col in dataset["train"].column_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if there is a column conflict here? E.g. we have two columns, foo.baz
and foo_baz
? Then I think these will conflict and one of the columns will be erased.
I know this is unlikely but might be good to check and handle this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritugala LGTM. Please test this in the CLI demo before merging.
Description