You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The readme file instruct the LightGPT Training procedures by starting at Step 1: Imitation Fine-tuning, so how to perform the filtering in the picture ?
The text was updated successfully, but these errors were encountered:
You can refer to our paper. We selectively retain only reasoning trajectories where the selected actions are associated with the highest estimated future reward.
Sorry for excluding the code of this process. We will add this script soon.
The readme file instruct the LightGPT Training procedures by starting at Step 1: Imitation Fine-tuning, so how to perform the filtering in the picture ?
The text was updated successfully, but these errors were encountered: