-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add tensorboard logger test and fix callback typo #560
Conversation
Everything looks good. Will approve once the tests pass. |
I think there is still something hapenning with the tensorboard logger and the trainer object, I have a meeting until 1pm, but i'll have look this afternoon. |
Sounds good. I'll leave this for now. Ping me when you're ready for a review. |
This turned out to be a large issue and definitely warranting of 1.3.0. Our tests did not cover the logger, we had discontinued it because of the annoyances of mocking the comet_ml experiment. This turns out to be tangential to the goal, but hid the actual problem, which was quite huge. The problem is upstream of any given logger, and with pytorch lightning trainer.predict methods. In f71eef4, I changed the predict_file methods to user trainer.predict. The virtue of this was that it made prediction much faster. We used to have to do alot of annoying workarounds, like not run evaluation that often because that function was so slow. What we didn't see was that when using a logger, like comet_ml or tensorboard, you cannot mix trainer.predict inside of an evaluation_hook. I have documented this on pytorch lightning repo. See lots of relevant conversation. Lightning-AI/pytorch-lightning#19101 The bad news means that training with a logger has been dead for three weeks and we didn't know. |
…l for users, but adds to our testing environment
i'm tempted to force push this, but I suppose it can wait until Monday. Once @ethanwhite merges I will bumpversion minor and push to pypi. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. My one meaningful thing is that I found the fact that predict.predict_file()
doesn't work directly on a file confusing. It sent me on a couple of tangents while reviewing which might suggest it could lead to confusing maintenance. Other than potentially renaming that there are a few docstring issues to cleanup and I think this should be ready to go it.
…ey are private methods. I improved the seperation of predict._predict_image_ and main.predict_image to follow the logic of other methods
@ethanwhite I pushed the docstring changes, renamed the predict.predict_file to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more small cleanups and then I'll get this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. I like the solution.
add tensorboard logger test and fix callback typo
Closes the #559 and #558 , there was a typo in callbacks only for the create_trainer(logger=), function. I added a test, but it really should have been caught somewhere else. This is high priority.