-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix decentralized learning fail #708
Conversation
JulienVig
commented
Jul 16, 2024
•
edited
Loading
edited
- Improve the aggregator's logic
- Fix the displayed number of participants in federated and decentralized learning. Closes The number of participants is wrong for both federated and decentralized learning #707
- Closes Decentralized training on MNIST fails with error #694 and Closes Decentralized learning fails with error #667
- Add CLI support for decentralized tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work on making decentralized work in browser! 🎉
a few logic/safety errors to fix and it can be merged 👍
this.aggregationResult, | ||
timeout(undefined, "Timeout waiting on the aggregation result promise to resolve") | ||
]) | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum, so if this timeouts, we will still procede to the next communication round, but the payload will be invalid (already used by current round and very probably changed by it). I think breaking would be safer.
also, by creating the timeout in there, it is a timeout per communication round, not for the whole onRoundEnd, which is a bit unexcepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I agree we should break/return here. However I don't see why a timeout per communication round is unexpected. Shouldn't we timeout whenever we may wait on a promise forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However I don't see why a timeout per communication round is unexpected. Shouldn't we timeout whenever we may wait on a promise forever?
yes, we should timeout for sure, but as each aggregator has a various number of communication round, the "onRoundEnd" timeout varies here which I find a bit weird. you can create the timeout promise at the beginning of the function and reference the same timeout in every communication round.
but yeah, maybe I'm overthinking this (won't be the first time), if you don't find it strange, let's keep it as it is (we don't really have a policy on timeouts anyway).