-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training details #9
Comments
Hi @paolotron . Thanks for using our benchmark. You can find the supplementary material from here. The video mentioned in supplementary material can be accessed from youtube. The training example in readme gives the same hyper-parameters that we used in our training. To provided the experiment results, you only need to change the pretrain model path, seed, and task name in the command and train for the same number of timesteps as we reported. It is worth mentioning that all the reported experiments were evaluated with 3 different seeds with the same configuration. |
Thank you very much, the fact that the experiments train for different number of iterations is due to the early stopping mechanism implemented in the code? |
Hi @paolotron . You can find the details of the early stopping mechanism here, which determines whether we need to update the policy in a single iteration. The number of iterations for the experiments is chosen differently for each task because of the difficulty of each task is not the same. (eg. The bucket task requires way more iterations to converge.) The RL checkpoints provided is policy pre-trained in the Segmentation on DAM setting and fine-tuned in RL, corresponding the final environment steps of each environment. Note that we pick the checkpoint from one of the three seeds we used in experiments. |
So where can I find the number of environment steps for each task for the results in Table 2? |
@paolotron The final number of x-axis for each environment in Fig. 6. |
Thank you for your answers, but what about the success rate reported in Figure 6 how can we compute the same graph? I tried adding an evaluation callback with the Environment defined with create_eval_env_fn but it doesn't seem to work |
To evaluate, each instance need to be evaluate 25 times for each seed, which will harm the training speed if you use callback function during training, but I think it should work if you implement correctly. I recommend to save models locally at evaluation frequency and use eval_policy.py to evaluate. |
Hi thank you for your work, I would like to reproduce the experiments reported in the paper, could you provide your training details for the experiments reported in table 2?
Also, could you provide a link for the supplementary material, I'm having trouble finding it online but it is referenced in the paper.
The text was updated successfully, but these errors were encountered: