-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests/models] refactor with BoringModel #5507
Conversation
Need help with the 3 failing GPU tests. |
I can check and help you on that, as I worked on |
that would be great. Thanks @gianscarpe |
@gianscarpe most of the changes here are independent so I guess the same amount of work will be required even if I split it into 2 or more. But yeah if others say to split this, it won't be a problem for me :) Also, did you find anything regarding the remaining 2 failing GPU tests? |
well yes, smaller PR would be easier to read, but since it changes just a few lines it shall be fine after all tests pass :] |
Co-authored-by: Kaushik B <[email protected]>
2f58059
to
4aa5a4b
Compare
@rohitgr7 @kaushikb11 mind check the last two failing tests? :] |
def test_step_end(self, outputs): | ||
self.log('test_acc', self.test_acc(outputs['logits'], outputs['y'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the meaning of this?
If we implement step_end we need to reduce outputs, right?
We get the outputs from all gpus.
for test accuracy, I recommend to compute it over the entire dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, this is on step end not on epoch end, so the underlying metric will just accumulate via the update
method. Only at epoch end will we call compute
I think
This is to get around the issue that training step is within the DP scatter op, where the outputs will be on multiple devices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I didn't see it's using a metric.
My concern was about the return None value here. But if that is handled internally then I guess no issue
Why did you merge it, the tests were not passing |
ok, that was an issue with switching branches, for while protection was off... :/ |
@Borda what do you suggest? can we undo the commit and then resubmit the PR? |
@kaushikb11 is already working on that fix... I guess #5924 |
Me and rohit were also working on the fix before it was merged |
What does this PR do?
Part of #4501
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃