-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow hf dataset id to be passed via training_data_path #431
feat: Allow hf dataset id to be passed via training_data_path #431
Conversation
Thanks for making a pull request! 😃 |
1 similar comment
Thanks for making a pull request! 😃 |
62df264
to
49c389e
Compare
Signed-off-by: Dushyant Behl <[email protected]>
a480bca
to
5fc2ec9
Compare
@@ -130,7 +130,7 @@ def replace_text(match_obj): | |||
if index_object not in element: | |||
raise KeyError("Requested template string is not a valid key in dict") | |||
|
|||
return element[index_object] | |||
return str(element[index_object]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset which I have tested in my unit test is lhoestq/demo1
which is used with the formatting template "### Text:{{review}} \n\n### Stars: {{star}}"
Now if you see the dataset the field star
is an int
which would cause an error if this element was not a string.
so this change supports any dataset formatting with dataset row types other than string as well.
Signed-off-by: Dushyant Behl <[email protected]>
Signed-off-by: Abhishek <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dushyantbehl for the code refactor to _try_load_dataset
and adding support for hf dataset id via training_data_path
. Refactored changes look good to me. Just small nit which I have pushed here.
changes look good to me @Abhishek-TAMU. Can you please also check why the image build failed post the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well
@dushyantbehl a new release of fms-acceleration just came out which may be causing this problem with the image build. |
I also verified that this works with a different dataset to load from HF, thanks Dushyant! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after the image build passes successfully!
Description of the change
Recent commit d7f06f5 introduced support for hf dataset id inside
data_config
but the code had a minor fix needed to support hf dataset id being passed viatraining_data_path
.With this patch code now supports hf dataset id via
training_data_path
argument.Related issue number
How to verify the PR
Tested with an hf dataset and added a unit test for the same.
Was the PR tested