Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

Open
DA21S321D opened this issue Apr 15, 2024 · 4 comments
Open

Comments

@DA21S321D
Copy link

图片
The readme file instruct the LightGPT Training procedures by starting at Step 1: Imitation Fine-tuning, so how to perform the filtering in the picture ?

@DA21S321D
Copy link
Author

I mean how to use that Critic to filter

@SQLai2099
Copy link
Collaborator

You can refer to our paper. We selectively retain only reasoning trajectories where the selected actions are associated with the highest estimated future reward.
Sorry for excluding the code of this process. We will add this script soon.

@Jarvis-K
Copy link

I use GPT-4-turbo to collect data and the imitation learning does show a good result without filtering.

@SQLai2099
Copy link
Collaborator

I use GPT-4-turbo to collect data and the imitation learning does show a good result without filtering.

Good to know. In my case, gpt-4-0613 still needs this filtering strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants