Training details #9

paolotron · 2023-10-11T11:45:04Z

Hi thank you for your work, I would like to reproduce the experiments reported in the paper, could you provide your training details for the experiments reported in table 2?

Also, could you provide a link for the supplementary material, I'm having trouble finding it online but it is referenced in the paper.

Kami-code · 2023-10-13T03:55:28Z

Hi @paolotron . Thanks for using our benchmark. You can find the supplementary material from here. The video mentioned in supplementary material can be accessed from youtube. The training example in readme gives the same hyper-parameters that we used in our training. To provided the experiment results, you only need to change the pretrain model path, seed, and task name in the command and train for the same number of timesteps as we reported. It is worth mentioning that all the reported experiments were evaluated with 3 different seeds with the same configuration.

paolotron · 2023-10-16T08:57:27Z

Thank you very much, the fact that the experiments train for different number of iterations is due to the early stopping mechanism implemented in the code?
Also what do the checkpoints provided in the assets/rl_checkpoints/ correspond in the paper? which pre-training do they use?

Kami-code · 2023-10-16T19:56:08Z

Hi @paolotron . You can find the details of the early stopping mechanism here, which determines whether we need to update the policy in a single iteration. The number of iterations for the experiments is chosen differently for each task because of the difficulty of each task is not the same. (eg. The bucket task requires way more iterations to converge.) The RL checkpoints provided is policy pre-trained in the Segmentation on DAM setting and fine-tuned in RL, corresponding the final environment steps of each environment. Note that we pick the checkpoint from one of the three seeds we used in experiments.

paolotron · 2023-10-23T15:22:45Z

So where can I find the number of environment steps for each task for the results in Table 2?

Kami-code · 2023-10-23T23:41:38Z

@paolotron The final number of x-axis for each environment in Fig. 6.

paolotron · 2023-10-26T12:36:24Z

Thank you for your answers, but what about the success rate reported in Figure 6 how can we compute the same graph? I tried adding an evaluation callback with the Environment defined with create_eval_env_fn but it doesn't seem to work

Kami-code · 2023-10-26T19:21:40Z

To evaluate, each instance need to be evaluate 25 times for each seed, which will harm the training speed if you use callback function during training, but I think it should work if you implement correctly. I recommend to save models locally at evaluation frequency and use eval_policy.py to evaluate.

Kami-code pinned this issue Oct 13, 2023

paolotron closed this as completed Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training details #9

Training details #9

paolotron commented Oct 11, 2023

Kami-code commented Oct 13, 2023

paolotron commented Oct 16, 2023 •

edited

Loading

Kami-code commented Oct 16, 2023

paolotron commented Oct 23, 2023

Kami-code commented Oct 23, 2023

paolotron commented Oct 26, 2023

Kami-code commented Oct 26, 2023 •

edited

Loading

Training details #9

Training details #9

Comments

paolotron commented Oct 11, 2023

Kami-code commented Oct 13, 2023

paolotron commented Oct 16, 2023 • edited Loading

Kami-code commented Oct 16, 2023

paolotron commented Oct 23, 2023

Kami-code commented Oct 23, 2023

paolotron commented Oct 26, 2023

Kami-code commented Oct 26, 2023 • edited Loading

paolotron commented Oct 16, 2023 •

edited

Loading

Kami-code commented Oct 26, 2023 •

edited

Loading