-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mistral fine-tuning and examples #395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I had a few suggestions, mostly for clarity!
have resolved all the issues as you had mentioned. there was a linting issue which was only showing up on GitHub actions, and not locally. locally, all pre-commit hooks are passing. have instructed to ignore the two error codes BLK 100 (Black would have made changes) and W503 (linebreak occured before binary operator). The BLK one is definitely fine to be ignored, since we are running black separately anyways. W503 raised errors on a lot of files that have been unmodified. |
.flake8
Outdated
@@ -1,3 +1,5 @@ | |||
[flake8] | |||
max-line-length = 88 | |||
extend-ignore = E203,FI10,FI11,FI12,FI13,FI14,FI15,FI16,FI17,FI18 | |||
per-file-ignores = prompt2model/dataset_transformer/prompt_template.py:E501 | |||
ignore = BLK100, W503 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two can actually be ignored globally, not just for this file.
BLK100 can be ignored because we are already running black in a separate action, and W503 seems to be commonly ignored by developers, since it conflicts with another warning in flake8, W504.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, makes sense. "ignore" would have ignored globally, but I could just add it to extend-ignore directly instead of creating a new line for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high-level, this looks great! The core of this PR (the QLoRA trainer) looks awesome and I have no technical notes about that.
Stylistically, I think this PR needs some revision in terms of comment formatting and variable naming.
The formatting of comments in this PR doesn't match typical recommended Python style (which we should adhere to). Python's style guidelines suggest using inline comments sparingly. Use comments in their own line whenever possible (which probably applies to all of these comments). Also, comments should almost always be full, grammatical English sentences with proper capitalization and punctuation. Please ensure that comments follow these two guidelines before we can merge this.
plan_prompt_fn: Callable[ | ||
[str, list[dict], str], str | ||
[str, list[dict], str, int], str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a complicated function interface like this, it would be good to explain what this function is (and why this particular interface is required) in the "Args:"
section of the docstring to this __init__
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, did this. also rearranged variables to make more logical sense
self.model, os.path.join(output_dir, "qlora_model") | ||
) | ||
self.model = self.model.merge_and_unload() | ||
self.model.save_pretrained(os.path.join(output_dir, "final_model")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the hard-coding of the model name here would cause problems, e.g. overwriting models.
It makes sense to at least store this as a variable (e.g. PEFT_MODEL_DIRECTORY=os.path.join(output_dir, "final_model")
) for better visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea with passing save_folder_path
to the function was to provide a path so that nothing in the current directory gets overwritten. but can also add the variable you mentioned for better visibility, and a comment in the demo
all the proposed changes make sense. have made the changes. please re-review when you get a chance, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks for doing the revisions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - we are variously using "QLora" and "QLoRA" in these changes; might be good to stick to the latter for consistency. (the other places where you refer to this in lowercase, "qlora", are a different story and I think those cases are totally fine).
Otherwise, LGTM!
made it QLoRA in all comments. thanks! |
Description
This PR contains 3 changes primarily:
examples
directory with 3 smaller examples for: (a) creating transfomed data (b) creating synthetic data (c) qlora fine-tuning a mistral model from an existing dataset. This sets us up well to add examples in the future relating to: model selection, hyperparam optimization, full fine-tuning, etc.