You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In orpo_finetuning_example.ipynb the currently the used dataset is "trl-lib/ultrafeedback_binarized" and it's loaded with split="train". Previously it was "mlabonne/orpo-dpo-mix-40k" with split="all". The current dataset causes failure during trainer creation:
Shouldn't it be split=None when loading the dataset?
Also the title in the Exercise section of the notebook is:
Exercise: Aligning SmolLM2 with DPOTrainer
Did you mean ORPOTrainer?
Finally, I noticed that you changed the optimizer to optim="paged_adamw_8bit". This optimizer requires bitsandbytes library, but this library is not listed in requirements.txt.
The text was updated successfully, but these errors were encountered:
After fixing these problems and finishing the training run I examined the metrics:
Accuracy is 0.5 the whole time, margins barely budged. Does it indicate failure of the training run (at least regarding the preference alignment part)?
@mmeendez8 Did you manage to fix orpo_finetuning_example.ipynb notebook so that the preference alignment training process actually works based on the reported metrics? I see that you fixed most of issues that I reported here, did you get the same metrics values as me?
In orpo_finetuning_example.ipynb the currently the used dataset is "trl-lib/ultrafeedback_binarized" and it's loaded with
split="train"
. Previously it was "mlabonne/orpo-dpo-mix-40k" withsplit="all"
. The current dataset causes failure during trainer creation:Shouldn't it be
split=None
when loading the dataset?Also the title in the Exercise section of the notebook is:
Did you mean ORPOTrainer?
Finally, I noticed that you changed the optimizer to
optim="paged_adamw_8bit"
. This optimizer requires bitsandbytes library, but this library is not listed in requirements.txt.The text was updated successfully, but these errors were encountered: