How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

DA21S321D · 2024-04-15T07:19:31Z

The readme file instruct the LightGPT Training procedures by starting at Step 1: Imitation Fine-tuning, so how to perform the filtering in the picture ?

DA21S321D · 2024-04-15T07:20:25Z

I mean how to use that Critic to filter

SQLai2099 · 2024-04-15T12:57:38Z

You can refer to our paper. We selectively retain only reasoning trajectories where the selected actions are associated with the highest estimated future reward.
Sorry for excluding the code of this process. We will add this script soon.

Jarvis-K · 2024-04-16T07:45:46Z

I use GPT-4-turbo to collect data and the imitation learning does show a good result without filtering.

SQLai2099 · 2024-04-16T08:08:43Z

I use GPT-4-turbo to collect data and the imitation learning does show a good result without filtering.

Good to know. In my case, gpt-4-0613 still needs this filtering strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

DA21S321D commented Apr 15, 2024

DA21S321D commented Apr 15, 2024

SQLai2099 commented Apr 15, 2024

Jarvis-K commented Apr 16, 2024

SQLai2099 commented Apr 16, 2024

How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

How to do filtering in the step 1 "Trajectory Collection and Filtering" #14

Comments

DA21S321D commented Apr 15, 2024

DA21S321D commented Apr 15, 2024

SQLai2099 commented Apr 15, 2024

Jarvis-K commented Apr 16, 2024

SQLai2099 commented Apr 16, 2024