Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why first and last layers of ResNet are not feeded to FPN? #15

Closed
IlyaOvodov opened this issue Oct 7, 2018 · 8 comments
Closed

Why first and last layers of ResNet are not feeded to FPN? #15

IlyaOvodov opened this issue Oct 7, 2018 · 8 comments

Comments

@IlyaOvodov
Copy link
Contributor

I've encountered that here:
https://github.com/qubvel/segmentation_models/blob/master/segmentation_models/fpn/model.py#L10
you are extracting outputs from the very begining of each Resnet layer to feed decoder. It results in ignoring the whole last Resnet layer (only BN and activation is taken from it).
Is where a reason for it?
Also data from high-resolution layer of ResNet (before the first MaxPool) is not used, with results in need to upsample FNN results by 4.
Is it like in original paper?

@qubvel
Copy link
Owner

qubvel commented Oct 7, 2018

This is kind of misunderstanding of naming convention. Actually, the first block of each stage has a strided convolution, so all next blocks follow with lower spatial dimensions. Taking skip connection at this block allows getting all features from the previous stage. For better understanding you can visualize network graph:

from segmentation_models import FPN
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

# create model with defined shape
model = FPN('resnet34', input_shape=(224,224,3))

#plot model
SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))

High dimensional layers are not used for pyramid as in original paper/presentation (http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf). FPN model, as PSP model, has been designed for multiclass segmentation of high-resolution images and this trick allow to reduce the number of training parameters while taking into consideration all image information.

You can look at model.py and pass your own layer names to build a model, maybe for your task, it will work better.

@IlyaOvodov
Copy link
Contributor Author

Hello! If you mean that 'stage2_unit1_relu1' actually is output of stage1+BN+relu etc., it is not surprise for me. Really I missed that extract_outputs(..., include_top=True) also adds outout of the whole network to outputs, listed in DEFAULT_FEATURE_PYRAMID_LAYERS. Drawing of structure helped me, so I see that the last ResNet layer is not ignored.
But still it seems strange that the 1st output taken by decoder is only Stage1 output, i.e. after 2 strides.
Adding relu0 (before 1st maxpooling) to decoder inputs really improves results (but makes it 2+ times slower :-( )

@qubvel
Copy link
Owner

qubvel commented Oct 7, 2018

Thaks for your comment, I was also thinking about including 'relu0' layer to pyramid, however leave it as described in presentation with x4 final upsampling. Would it be useful to make it as an optional argument?

P.S. Are you using this network for salt semntation challange?

@IlyaOvodov IlyaOvodov changed the title Why first and last layers of ResNet are not feeded to FNN? Why first and last layers of ResNet are not feeded to FPN? Oct 7, 2018
@IlyaOvodov
Copy link
Contributor Author

Yes. Sending relu0 to decoder improves metrics from 0.809 (some baseline configuration) to 0.824, but also "improves" LR time from 17 to 38 s/iter.

@IlyaOvodov
Copy link
Contributor Author

IlyaOvodov commented Oct 7, 2018

By the way. I've changed interpolation to 'bilinear', and it substatially improved result. Then I've seen that this parameter influences interpolation only in final upsampling before concat, but not inside pyramid_block. I've fixed it and dropped interpolation into pyramid_block and was going to make PR, But in tests it produced only small negative effect. Have you any ideas about it?

@qubvel
Copy link
Owner

qubvel commented Oct 7, 2018

A few ideas:

  1. there is a theory that tf.resize_image with interpolation different from 'nearest' is broken and shift image by 1 px (https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35). So using this layer with interpolation 'bilinear' at lower dimensions affect too much, on the other side we have skip connections and it is enough for 'nearest' interpolation to reconstruct fine features.
  2. Sometimes interpolation ('bilinear', 'bicubic') produces strange artefacts on corners, so you can check mertic for images grouped by mask size to check where the metric decreased significantly.
  3. It's just a random error 😄

P.S. Would be nice if you add here align_corners=True and make the same experiment again.

@IlyaOvodov
Copy link
Contributor Author

I'll try but later. Currently my computer is busy by other tasks

@qubvel qubvel closed this as completed Oct 20, 2018
@IlyaOvodov
Copy link
Contributor Author

Hi! I've found some time to test influence of align_corners=True and bilinear interpolation in pyramid_layer. Results resambles to be slightly confusing. See attached file

align_corners.xlsx

I've tested current configuration (rev 6827a82 21.12.2018) on my pipeline from Kaggle TGS Salt competition https://www.kaggle.com/c/tgs-salt-identification-challenge. Tests were done in 128x128 resolution using ResNet34 :) and FPN. There were 5 configurations:

  1. interpolation = nearest
  2. interpolation = bilinear
  3. interpolation = bilinear + interpolation = bilinear was passed in pyramid_block here
  4. interpolation = bilinear + align_corners=True added here
  5. Summa of both (3) and (5) modifications.

Tests were done 5 times in each configuration using 5-fold cross-validation. Graph in a file shows median and avarage metric on 5 folds for 2 local validations, public and private LB on Kaggle.

It seems that 1) interpolation = bilinear is definitely better then nearest (now bilinear is default, good)
2) adding align_corners=True really improves results in both current version and version with updated pyramid_block, so it is worth change it.
3) it still does not make pyramid_block with interpolation = bilinear better then with interpolation = nearest.

But same tests on an old version dcd715 (v0.1.1, 12.09.2018) confuses me, as it shows that option 3 (modifyed interpolation in pyramid_block) improves results, but adding align_corners=True spoils results.
As I can see, current version differes from the old one only by using BN by default...

Of cause also dataset from Kaggle Salt is rather strange and specific...

wouterzwerink pushed a commit to wouterzwerink/segmentation_models that referenced this issue Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants