Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training details #9

Closed
paolotron opened this issue Oct 11, 2023 · 7 comments
Closed

Training details #9

paolotron opened this issue Oct 11, 2023 · 7 comments

Comments

@paolotron
Copy link

Hi thank you for your work, I would like to reproduce the experiments reported in the paper, could you provide your training details for the experiments reported in table 2?

Also, could you provide a link for the supplementary material, I'm having trouble finding it online but it is referenced in the paper.

@Kami-code
Copy link
Owner

Hi @paolotron . Thanks for using our benchmark. You can find the supplementary material from here. The video mentioned in supplementary material can be accessed from youtube. The training example in readme gives the same hyper-parameters that we used in our training. To provided the experiment results, you only need to change the pretrain model path, seed, and task name in the command and train for the same number of timesteps as we reported. It is worth mentioning that all the reported experiments were evaluated with 3 different seeds with the same configuration.

@Kami-code Kami-code pinned this issue Oct 13, 2023
@paolotron
Copy link
Author

paolotron commented Oct 16, 2023

Thank you very much, the fact that the experiments train for different number of iterations is due to the early stopping mechanism implemented in the code?
Also what do the checkpoints provided in the assets/rl_checkpoints/ correspond in the paper? which pre-training do they use?

@Kami-code
Copy link
Owner

Hi @paolotron . You can find the details of the early stopping mechanism here, which determines whether we need to update the policy in a single iteration. The number of iterations for the experiments is chosen differently for each task because of the difficulty of each task is not the same. (eg. The bucket task requires way more iterations to converge.) The RL checkpoints provided is policy pre-trained in the Segmentation on DAM setting and fine-tuned in RL, corresponding the final environment steps of each environment. Note that we pick the checkpoint from one of the three seeds we used in experiments.

@paolotron
Copy link
Author

So where can I find the number of environment steps for each task for the results in Table 2?

@Kami-code
Copy link
Owner

@paolotron The final number of x-axis for each environment in Fig. 6.

@paolotron
Copy link
Author

Thank you for your answers, but what about the success rate reported in Figure 6 how can we compute the same graph? I tried adding an evaluation callback with the Environment defined with create_eval_env_fn but it doesn't seem to work

@Kami-code
Copy link
Owner

Kami-code commented Oct 26, 2023

To evaluate, each instance need to be evaluate 25 times for each seed, which will harm the training speed if you use callback function during training, but I think it should work if you implement correctly. I recommend to save models locally at evaluation frequency and use eval_policy.py to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants