-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syncing doesn't work with wandb #30
Comments
Just double checking: You do run the The hook (whose output you see above) creates a file in |
Thank you, for your reply. |
Could you describe your setup? Since you're using this package, I assume you are running your ML on a batch system where the compute nodes don't have internet. |
Yes, I have a head node from which I lauch jobs on the computing node with sbatch. And yes, the head node has internet and the others don t. Tell me if this is right: the myscript.sh looks like that:
|
I I try what I just proposed I get in std err of my script:
and in the tmux session where wandb-osh is running I have:
|
Yes, that's the correct procedure. The first The real question is why I actually think this is a bug in I've always tested with I will fix this in the next two hours and then let you know. I'd be super happy if you could test again then. |
thanks :) I ll do that. |
I manage to make it work with |
Yes, that fixes the bug with the wrong run directories that were assumed by Could you test my fix by updating the package ( |
@all-contributors please add @barthelemymp for bug |
I've put up a pull request to add @barthelemymp! 🎉 |
nope: still get
when installing I had to add the path by hand. do you have a command to check that the wandb-osh I call is the updated one ? |
Can you check |
Alternatively, you can do import wandb_osh
print(wandb_osh.__version__) |
Tell ms if I can do some more test on my side. Best Barthelemy |
Just double checking: It's also updated in the python you use in the batch scripts, right? (just in case you use some conda env there, etc.). The fix was related to the hook that is included in the python package, not the Because I cannot believe that it still points to the paths that end in You could also do
and then try again, as the newest version now prints out the version number at the beginning |
If running your toy analysis is too much work, you can also try this simple snippet here: #!/usr/bin/env python3
import wandb
import os
from wandb_osh.hooks import TriggerWandbSyncHook
sync_hook = TriggerWandbSyncHook()
os.environ['WANDB_SILENT'] = 'true'
os.environ["WANDB_MODE"] = "offline"
wandb.init()
wandb.log({"loss": 123})
sync_hook() Run it and it should print something like
and if you do
(note how it doesn't end in |
So it is printing the right version:
thank you foryour commitment :) |
Yes, now it points to the correct paths; that should work. Are you running any training in parallel? Because if you synced manually or before, maybe there really is nothing to be synced. Also, can you check in your script's output what wandb tells you to do for syncing: I usually see something like
and the path after |
Here it is :
|
The link shown above has exactly the same structure of the links as shown in the output of If you had
right? |
|
OK, I found one more thing: On my laptop But let me change that in the package real quick. |
OK. Could you try
and try one last time? The version should then be I'm very sorry to use you as a beta tester here ;) But I'm absolutely confident that it will work now :) |
Clap Clap!! It works! |
Awesome! Thank you so much again :) |
Hello,
First thank you for creating this tool!
Unfortunately I do not manage to make it work.
I have got this error each time I use
trigger_sync
:I am not sure where it comes from... any idea ?
best
b
The text was updated successfully, but these errors were encountered: