Why first and last layers of ResNet are not feeded to FPN? #15

IlyaOvodov · 2018-10-07T10:07:47Z

I've encountered that here:
https://github.com/qubvel/segmentation_models/blob/master/segmentation_models/fpn/model.py#L10
you are extracting outputs from the very begining of each Resnet layer to feed decoder. It results in ignoring the whole last Resnet layer (only BN and activation is taken from it).
Is where a reason for it?
Also data from high-resolution layer of ResNet (before the first MaxPool) is not used, with results in need to upsample FNN results by 4.
Is it like in original paper?

qubvel · 2018-10-07T10:43:12Z

This is kind of misunderstanding of naming convention. Actually, the first block of each stage has a strided convolution, so all next blocks follow with lower spatial dimensions. Taking skip connection at this block allows getting all features from the previous stage. For better understanding you can visualize network graph:

from segmentation_models import FPN
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

# create model with defined shape
model = FPN('resnet34', input_shape=(224,224,3))

#plot model
SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))

High dimensional layers are not used for pyramid as in original paper/presentation (http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf). FPN model, as PSP model, has been designed for multiclass segmentation of high-resolution images and this trick allow to reduce the number of training parameters while taking into consideration all image information.

You can look at model.py and pass your own layer names to build a model, maybe for your task, it will work better.

IlyaOvodov · 2018-10-07T15:33:48Z

Hello! If you mean that 'stage2_unit1_relu1' actually is output of stage1+BN+relu etc., it is not surprise for me. Really I missed that extract_outputs(..., include_top=True) also adds outout of the whole network to outputs, listed in DEFAULT_FEATURE_PYRAMID_LAYERS. Drawing of structure helped me, so I see that the last ResNet layer is not ignored.
But still it seems strange that the 1st output taken by decoder is only Stage1 output, i.e. after 2 strides.
Adding relu0 (before 1st maxpooling) to decoder inputs really improves results (but makes it 2+ times slower :-( )

qubvel · 2018-10-07T15:46:31Z

Thaks for your comment, I was also thinking about including 'relu0' layer to pyramid, however leave it as described in presentation with x4 final upsampling. Would it be useful to make it as an optional argument?

P.S. Are you using this network for salt semntation challange?

IlyaOvodov · 2018-10-07T16:12:19Z

Yes. Sending relu0 to decoder improves metrics from 0.809 (some baseline configuration) to 0.824, but also "improves" LR time from 17 to 38 s/iter.

IlyaOvodov · 2018-10-07T16:19:51Z

By the way. I've changed interpolation to 'bilinear', and it substatially improved result. Then I've seen that this parameter influences interpolation only in final upsampling before concat, but not inside pyramid_block. I've fixed it and dropped interpolation into pyramid_block and was going to make PR, But in tests it produced only small negative effect. Have you any ideas about it?

qubvel · 2018-10-07T16:38:31Z

A few ideas:

there is a theory that tf.resize_image with interpolation different from 'nearest' is broken and shift image by 1 px (https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35). So using this layer with interpolation 'bilinear' at lower dimensions affect too much, on the other side we have skip connections and it is enough for 'nearest' interpolation to reconstruct fine features.
Sometimes interpolation ('bilinear', 'bicubic') produces strange artefacts on corners, so you can check mertic for images grouped by mask size to check where the metric decreased significantly.
It's just a random error 😄

P.S. Would be nice if you add here align_corners=True and make the same experiment again.

IlyaOvodov · 2018-10-08T07:54:34Z

I'll try but later. Currently my computer is busy by other tasks

IlyaOvodov · 2019-01-17T18:38:00Z

Hi! I've found some time to test influence of align_corners=True and bilinear interpolation in pyramid_layer. Results resambles to be slightly confusing. See attached file

align_corners.xlsx

I've tested current configuration (rev 6827a82 21.12.2018) on my pipeline from Kaggle TGS Salt competition https://www.kaggle.com/c/tgs-salt-identification-challenge. Tests were done in 128x128 resolution using ResNet34 :) and FPN. There were 5 configurations:

interpolation = nearest
interpolation = bilinear
interpolation = bilinear + interpolation = bilinear was passed in pyramid_block here
interpolation = bilinear + align_corners=True added here
Summa of both (3) and (5) modifications.

Tests were done 5 times in each configuration using 5-fold cross-validation. Graph in a file shows median and avarage metric on 5 folds for 2 local validations, public and private LB on Kaggle.

It seems that 1) interpolation = bilinear is definitely better then nearest (now bilinear is default, good)
2) adding align_corners=True really improves results in both current version and version with updated pyramid_block, so it is worth change it.
3) it still does not make pyramid_block with interpolation = bilinear better then with interpolation = nearest.

But same tests on an old version dcd715 (v0.1.1, 12.09.2018) confuses me, as it shows that option 3 (modifyed interpolation in pyramid_block) improves results, but adding align_corners=True spoils results.
As I can see, current version differes from the old one only by using BN by default...

Of cause also dataset from Kaggle Salt is rather strange and specific...

Fix pickling `preprocess_input` function

IlyaOvodov changed the title ~~Why first and last layers of ResNet are not feeded to FNN?~~ Why first and last layers of ResNet are not feeded to FPN? Oct 7, 2018

qubvel closed this as completed Oct 20, 2018

IlyaOvodov mentioned this issue Jan 17, 2019

Added align_corners=True to tf.image.resize_bilinear for FPN, PSPNet #37

Merged

wouterzwerink pushed a commit to wouterzwerink/segmentation_models that referenced this issue Jun 21, 2024

Merge pull request qubvel#15 from qubvel/fix/pickling-preprocess-input

3e9b263

Fix pickling `preprocess_input` function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why first and last layers of ResNet are not feeded to FPN? #15

Why first and last layers of ResNet are not feeded to FPN? #15

IlyaOvodov commented Oct 7, 2018

qubvel commented Oct 7, 2018 •

edited

Loading

IlyaOvodov commented Oct 7, 2018

qubvel commented Oct 7, 2018

IlyaOvodov commented Oct 7, 2018

IlyaOvodov commented Oct 7, 2018 •

edited

Loading

qubvel commented Oct 7, 2018 •

edited

Loading

IlyaOvodov commented Oct 8, 2018

IlyaOvodov commented Jan 17, 2019

Why first and last layers of ResNet are not feeded to FPN? #15

Why first and last layers of ResNet are not feeded to FPN? #15

Comments

IlyaOvodov commented Oct 7, 2018

qubvel commented Oct 7, 2018 • edited Loading

IlyaOvodov commented Oct 7, 2018

qubvel commented Oct 7, 2018

IlyaOvodov commented Oct 7, 2018

IlyaOvodov commented Oct 7, 2018 • edited Loading

qubvel commented Oct 7, 2018 • edited Loading

IlyaOvodov commented Oct 8, 2018

IlyaOvodov commented Jan 17, 2019

qubvel commented Oct 7, 2018 •

edited

Loading

IlyaOvodov commented Oct 7, 2018 •

edited

Loading

qubvel commented Oct 7, 2018 •

edited

Loading