-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update and fix bugs #51
Conversation
Sorry, in my case, the loss exposion issue during evaluation in #35 and #45 still exists. I use the pytorch 0.4.0 compatible code |
Hi @d-li14, The code is working okay with the pytorch master branch. |
Thanks for your timely reply. I have tried that module in some semantic segmentation code, and taking the drn training on cityscapes dataset as a naive example, with just replacing this line of original syncbn with [2018-05-16 03:24:16,377 segment.py:232 validate] Test: [0/31] Time 16.048 (16.048) Loss 26639412.0000 (26639412.0000) Score 0.593 (0.593)
[2018-05-16 03:24:20,453 segment.py:232 validate] Test: [10/31] Time 0.405 (1.829) Loss 33462836.0000 (29009771.8182) Score 0.880 (0.561)
[2018-05-16 03:24:28,128 segment.py:232 validate] Test: [20/31] Time 0.412 (1.324) Loss 20237872.0000 (27516075.8095) Score 2.562 (0.626)
[2018-05-16 03:24:32,216 segment.py:232 validate] Test: [30/31] Time 0.423 (1.029) Loss 15786181.0000 (22639639.2903) Score 2.251 (0.671) |
Typically, it is not necessary to calculate loss in evaluation mode. You can try to add the following code https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/syncbn.py#L41
|
Well, there is no need to calculate the loss in evaluation mode exactly, but as we can see in the log above, the score is also abnormally low in contrast to that in the training mode, which seems to mean that the model makes totally unreasonable predictions during evaluation. |
I am not a user of this repo, but I am experiencing a similar issue with my own code for eval/train. Basically, if I train a 1 layer neural net using train mode on a single batch, and then switch the eval mode, and evaluate on the same batch that I trained on, the loss is much worse. I'm not sure if there's something stupid that I'm doing, or if theres a bug in pyTorch. |
That is because different behavior of BatchNorm layer in training/eval mode |
Yes, but this problem is still happening for a single batch only. Surely after training 100 steps on a single batch during training mode the parameters for the running mean/std wouldn't change right? |
train mode: using mean(x) and var(x) |
Yes I am aware, but that does not solve the problem- the dataset is a single batch- I use a single batch only and repeatedly feed it through the batch Norm layer. Because of this, the batch norm running mean and std stay the same after a few iterations. I even compare the running mean and average for both train and eval mode and it’s the same - which is to be expected since were only looking at one batch of data. Despite all of this, using eval mode gives different results than train mode.
Mean(x) should be the same as accumulated mean(x) if your dealing with a single batch right?
…Sent from my iPhone
On Oct 16, 2018, at 7:33 PM, Hang Zhang ***@***.***> wrote:
train mode: using mean(x) and var(x)
eval mode: using accumulated_mean and accumulated_var, which are averaged over the dataset
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This PR should have addressed most of the issues:
fixes #48
fixes #47
fixes #46
fixes #45
fixes #35