-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_baseline_parallel.py does not seem to restart from checkpoint? #56
Comments
Ah, this was left over from some old experiments. If the folder isn't found it will start a new training session. If you point it to an existing one, it will restarting training from that checkpoint. This could definitely be made clearer in the code 🤔 |
|
I was able to get it to restore the latest session but I'm getting an error. Maybe someone sees what I'm doing wrong. I added the code needed at the bottom. session_4da05e87_main_good/poke_439746560_steps loading checkpoint import glob def find_latest_session_and_poke():
if name == 'main': |
I just noticed you can read "saves_to_record.txt" and get the last value instead. |
might be line 50?
in run_baseline_parallel.py it is defined differently as
file_name = 'session_e41c9eff/poke_38207488_steps' #'session_e41c9eff/poke_250871808_steps'
in run_baseline.py it is defined differently as
file_name = 'poke_' #'best_12-7/poke_12_b'
I have no folder 'session_e41c9eff', so this seems to be misconfigured.
Can this be fixed?
The text was updated successfully, but these errors were encountered: