model.eval() seems not work well #35

mapleneverfade · 2018-04-09T05:16:10Z

It's very kind of your work ! There's still something i want for your help!
My code for model and criterion parallel like this:
model = encoding.parallel.ModelDataParallel(model,device_ids=[0,1,2])
criterion = encoding.parallel.CriterionDataParallel(criterion,device_ids=[0,1,2])
Training process is going well, but when i turn to model.eval(), i got explosion of loss.

Is there something wrong with model.eval()?

zhanghang1989 · 2018-04-10T16:15:52Z

Are you training with evaluation mode?

mapleneverfade · 2018-04-11T01:35:57Z

I set model.train() before every epoch of loader.

zhanghang1989 · 2018-04-11T01:38:41Z

Thx for clarifying! What is the mIoU you were expecting? For example, what is the mIoU when using standard BatchNorm at 1st epoch?

mapleneverfade · 2018-04-11T02:35:33Z

zhanghang1989 · 2018-04-11T16:47:21Z

Am I missing something. You said, "i turn to model.eval(), i got explosion of loss."
What does it mean? We typically do not calculate loss in evaluation mode.

mapleneverfade · 2018-04-12T02:51:24Z

Maybe I didn't clear it out. When I use standard BatchNorm I get normal loss both in train mode and eval mode.
While using syn-bn I get norm loss during training but explode in eval mode, it means there maybe
something wrong with syn-bn when turned it to eval mode.
I test the syn-bn model in model.eval(), got this

Calculating eval loss is not what i expect to, I just want to figure out what causes the test IoU to be zero.
Or is there something I misused？

zhanghang1989 · 2018-04-13T06:28:37Z

Please checkout the PyTorch compatible Synchronized Cross-GPU encoding.nn.BatchNorm2d and the example.

zhanghang1989 · 2018-04-15T16:43:33Z

@mapleneverfade The sycBN works the same as standard BN in eval mode. Please try the new PyTorch DataParallel compatible version.

mapleneverfade · 2018-04-16T09:21:04Z

I really appreciate for your replying.
But I still don't figure out what's going wrong.

Combination of "ModelDataParallel() & CriterionDataParallel() & Standard BN" goes well,
when I replace the "Standard BN" with "encoding.nn.BatchNorm2d",eval-mode still don't work.

zhanghang1989 · 2018-04-16T13:53:07Z

Hi, have you checked out the example https://github.com/zhanghang1989/PyTorch-SyncBatchNorm

zhanghang1989 · 2018-04-22T19:01:06Z

Hi @mapleneverfade , do you still have the problem? I still couldn't get it why do you calculate loss in eval mode

zhanghang1989 added the help wanted label Apr 15, 2018

d-li14 mentioned this issue May 9, 2018

syn-batchnorm : error during validation, train seems good. #45

Closed

zhanghang1989 mentioned this issue May 15, 2018

update and fix bugs #51

Merged

zhanghang1989 closed this as completed in #51 May 15, 2018

roseif mentioned this issue Aug 14, 2020

What is the reason for this problem during training？ #312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.eval() seems not work well #35

model.eval() seems not work well #35

mapleneverfade commented Apr 9, 2018 •

edited

Loading

zhanghang1989 commented Apr 10, 2018

mapleneverfade commented Apr 11, 2018

zhanghang1989 commented Apr 11, 2018

mapleneverfade commented Apr 11, 2018

zhanghang1989 commented Apr 11, 2018

mapleneverfade commented Apr 12, 2018

zhanghang1989 commented Apr 13, 2018

zhanghang1989 commented Apr 15, 2018

mapleneverfade commented Apr 16, 2018

zhanghang1989 commented Apr 16, 2018

zhanghang1989 commented Apr 22, 2018

model.eval() seems not work well #35

model.eval() seems not work well #35

Comments

mapleneverfade commented Apr 9, 2018 • edited Loading

zhanghang1989 commented Apr 10, 2018

mapleneverfade commented Apr 11, 2018

zhanghang1989 commented Apr 11, 2018

mapleneverfade commented Apr 11, 2018

zhanghang1989 commented Apr 11, 2018

mapleneverfade commented Apr 12, 2018

zhanghang1989 commented Apr 13, 2018

zhanghang1989 commented Apr 15, 2018

mapleneverfade commented Apr 16, 2018

zhanghang1989 commented Apr 16, 2018

zhanghang1989 commented Apr 22, 2018

mapleneverfade commented Apr 9, 2018 •

edited

Loading