Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune problem #36

Open
nywang2019 opened this issue Oct 11, 2019 · 9 comments
Open

finetune problem #36

nywang2019 opened this issue Oct 11, 2019 · 9 comments

Comments

@nywang2019
Copy link

nywang2019 commented Oct 11, 2019

I pretrained the model with my own data about 10 epochs. and the result does not converge. then i want to try finetune step.but failed. any one can help me? thanks. @shepnerd (my image size is 512X512, about 1000 pics in trainingset)
RuntimeError: size mismatch, m1: [4 x 4096], m2: [16384 x 1] at C:/w/1/s/tmp_conda_3.7_055457/conda/conda-bld/pytorch_1565416617654/work/aten/src\THC/generic/THCTens
orMathBlas.cu:273

@nywang2019
Copy link
Author

here is the full information:
(base) D:\PycharmProjects2\inpainting_gmcnn\pytorch>python train.py --dataset Mydata --data_file ./training_data_list/data_list.txt --gpu_ids 0 --pretrain_network 0
--load_model_dir ./checkpoints/20191010-124841_GMCNN_Mydata_b4_s256x256_gc32_dc64_randmask-rect_pretrain --batch_size 4
------------ Options -------------
D_max_iters: 5
batch_size: 4
checkpoint_dir: ./checkpoints
d_cnum: 64
data_file: ./training_data_list/data_list.txt
dataset: Mydata
dataset_path: ./training_data_list/data_list.txt
date_str: 20191010-193601
epochs: 10
g_cnum: 32
gpu_ids: ['0']
img_shapes: [256, 256, 3]
lambda_adv: 0.001
lambda_ae: 1.2
lambda_gp: 10
lambda_mrf: 0.05
lambda_rec: 1.4
load_model_dir: ./checkpoints/20191010-124841_GMCNN_Mydata_b4_s256x256_gc32_dc64_randmask-rect_pretrain
lr: 1e-05
margins: [0, 0]
mask_shapes: [64, 64]
mask_type: rect
max_delta_shapes: [32, 32]
model_folder: ./checkpoints\20191010-193601_GMCNN_Mydata_b4_s256x256_gc32_dc64_randmask-rect
model_name: GMCNN
padding: SAME
phase: train
pretrain_network: False
random_crop: True
random_mask: True
random_seed: False
spectral_norm: True
train_spe: 100
vgg19_path: vgg19_weights/imagenet-vgg-verydeep-19.mat
viz_steps: 5
-------------- End ----------------
loading data..
data loaded..
configuring model..
initialize network with normal
initialize network with normal
---------- Networks initialized -------------
GMCNN(
(EB1): ModuleList(
(0): Conv2d(4, 32, kernel_size=(7, 7), stride=(1, 1))
(1): Conv2d(32, 64, kernel_size=(7, 7), stride=(2, 2))
(2): Conv2d(64, 64, kernel_size=(7, 7), stride=(1, 1))
(3): Conv2d(64, 128, kernel_size=(7, 7), stride=(2, 2))
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1))
(5): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1))
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), dilation=(2, 2))
(7): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), dilation=(4, 4))
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), dilation=(8, 8))
(9): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), dilation=(16, 16))
(10): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1))
(11): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1))
(12): PureUpsampling()
)
(EB2): ModuleList(
(0): Conv2d(4, 32, kernel_size=(5, 5), stride=(1, 1))
(1): Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2))
(2): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1))
(3): Conv2d(64, 128, kernel_size=(5, 5), stride=(2, 2))
(4): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1))
(5): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1))
(6): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1), dilation=(2, 2))
(7): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1), dilation=(4, 4))
(8): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1), dilation=(8, 8))
(9): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1), dilation=(16, 16))
(10): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1))
(11): Conv2d(128, 128, kernel_size=(5, 5), stride=(1, 1))
(12): PureUpsampling()
(13): Conv2d(128, 64, kernel_size=(5, 5), stride=(1, 1))
(14): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1))
(15): PureUpsampling()
)
(EB3): ModuleList(
(0): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1))
(1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2))
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2))
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(5): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), dilation=(2, 2))
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), dilation=(4, 4))
(8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), dilation=(8, 8))
(9): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), dilation=(16, 16))
(10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(11): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(12): PureUpsampling()
(13): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1))
(14): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(15): PureUpsampling()
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1))
(17): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
)
(decoding_layers): ModuleList(
(0): Conv2d(224, 16, kernel_size=(3, 3), stride=(1, 1))
(1): Conv2d(16, 3, kernel_size=(3, 3), stride=(1, 1))
)
(pads): ModuleList(
(0): ReflectionPad2d((0, 0, 0, 0))
(1): ReflectionPad2d((1, 1, 1, 1))
(2): ReflectionPad2d((2, 2, 2, 2))
(3): ReflectionPad2d((3, 3, 3, 3))
(4): ReflectionPad2d((4, 4, 4, 4))
(5): ReflectionPad2d((5, 5, 5, 5))
(6): ReflectionPad2d((6, 6, 6, 6))
(7): ReflectionPad2d((7, 7, 7, 7))
(8): ReflectionPad2d((8, 8, 8, 8))
(9): ReflectionPad2d((9, 9, 9, 9))
(10): ReflectionPad2d((10, 10, 10, 10))
(11): ReflectionPad2d((11, 11, 11, 11))
(12): ReflectionPad2d((12, 12, 12, 12))
(13): ReflectionPad2d((13, 13, 13, 13))
(14): ReflectionPad2d((14, 14, 14, 14))
(15): ReflectionPad2d((15, 15, 15, 15))
(16): ReflectionPad2d((16, 16, 16, 16))
(17): ReflectionPad2d((17, 17, 17, 17))
(18): ReflectionPad2d((18, 18, 18, 18))
(19): ReflectionPad2d((19, 19, 19, 19))
(20): ReflectionPad2d((20, 20, 20, 20))
(21): ReflectionPad2d((21, 21, 21, 21))
(22): ReflectionPad2d((22, 22, 22, 22))
(23): ReflectionPad2d((23, 23, 23, 23))
(24): ReflectionPad2d((24, 24, 24, 24))
(25): ReflectionPad2d((25, 25, 25, 25))
(26): ReflectionPad2d((26, 26, 26, 26))
(27): ReflectionPad2d((27, 27, 27, 27))
(28): ReflectionPad2d((28, 28, 28, 28))
(29): ReflectionPad2d((29, 29, 29, 29))
(30): ReflectionPad2d((30, 30, 30, 30))
(31): ReflectionPad2d((31, 31, 31, 31))
(32): ReflectionPad2d((32, 32, 32, 32))
(33): ReflectionPad2d((33, 33, 33, 33))
(34): ReflectionPad2d((34, 34, 34, 34))
(35): ReflectionPad2d((35, 35, 35, 35))
(36): ReflectionPad2d((36, 36, 36, 36))
(37): ReflectionPad2d((37, 37, 37, 37))
(38): ReflectionPad2d((38, 38, 38, 38))
(39): ReflectionPad2d((39, 39, 39, 39))
(40): ReflectionPad2d((40, 40, 40, 40))
(41): ReflectionPad2d((41, 41, 41, 41))
(42): ReflectionPad2d((42, 42, 42, 42))
(43): ReflectionPad2d((43, 43, 43, 43))
(44): ReflectionPad2d((44, 44, 44, 44))
(45): ReflectionPad2d((45, 45, 45, 45))
(46): ReflectionPad2d((46, 46, 46, 46))
(47): ReflectionPad2d((47, 47, 47, 47))
(48): ReflectionPad2d((48, 48, 48, 48))
)
)
[Network GM] Total number of parameters : 12.562 M
Loading pretrained model from ./checkpoints/20191010-124841_GMCNN_Mydata_b4_s256x256_gc32_dc64_randmask-rect_pretrain
loading the model from ./checkpoints/20191010-124841_GMCNN_Mydata_b4_s256x256_gc32_dc64_randmask-rect_pretrain\2_net_GM.pth
Loading done.
model setting up..
training initializing..
Traceback (most recent call last):
File "train.py", line 41, in
ourModel.optimize_parameters()
File "D:\PycharmProjects2\inpainting_gmcnn\pytorch\model\net.py", line 332, in optimize_parameters
self.forward_D()
File "D:\PycharmProjects2\inpainting_gmcnn\pytorch\model\net.py", line 304, in forward_D
self.completed_logit, self.completed_local_logit = self.netD(self.completed.detach(), self.completed_local.detach())
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\PycharmProjects2\inpainting_gmcnn\pytorch\model\net.py", line 202, in forward
x_local = self.local_discriminator(x_l)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\PycharmProjects2\inpainting_gmcnn\pytorch\model\net.py", line 184, in forward
self.logit = self.layers-1
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\PycharmProjects2\inpainting_gmcnn\pytorch\model\layer.py", line 267, in forward
return self.module.forward(*input)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "D:\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1369, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [4 x 4096], m2: [16384 x 1] at C:/w/1/s/tmp_conda_3.7_055457/conda/conda-bld/pytorch_1565416617654/work/aten/src\THC/generic/THCTens
orMathBlas.cu:273

@shepnerd
Copy link
Owner

The input channel number of the fc layer in pytorch version needs to be modified if the mask shape changes. Since the current input image size changes from 128x128 to 64x64 for the local discriminator, L260 in model/net.py should be updated to l_fc_channels=4 * 4 * opt.d_cnum * 4.

@nywang2019
Copy link
Author

i updated net.py, and it works, thanks a lot.

@xyzdcgan
Copy link

Hi, can you describe the steps for training with custom dataset in tensorflow. @nywang2019
Thanks

@nywang2019
Copy link
Author

@xyzdcgan hi,just follow what @shepnerd suggested in this post. see blow:
The input channel number of the fc layer in pytorch version needs to be modified if the mask shape changes. Since the current input image size changes from 128x128 to 64x64 for the local discriminator, L260 in model/net.py should be updated to l_fc_channels=4 * 4 * opt.d_cnum * 4.

@xyzdcgan
Copy link

Hi @nywang2019 can you please explain how does gan generate image..?

What i know about GAN: there is a discriminator and generator, discriminator differ from fake and real image, and generator generate new images till discriminator cannot find difference between original and fake image.

Even both use CNN algorithm in the model.

My question how gan performs object removal..??

Lets say for example i have added a mask near eye region where it has wrinkle, and clicked the complete button.

When image is generated how it is generated without wrinkle...?

One of actual process of the GAN which i know is, ATTGAN, where to generate smiling image for non-smiling image, it gather smiling facial expression from dataset/model trained, then it apply those smiling expression to the image given as input.

In case of GMCNN, how does model select mask area and then make it wrinkle free.
what is actual thing that tells these changes need to be applied to the mask region

Lets assume i have a face image with some pimple i applied mask and it removed pimples from image, how it was able to generated image without pimples.

Can you please tell me what are internal process of GAN, how all this work.
@nywang2019 @shepnerd

Thanks

@shepnerd
Copy link
Owner

shepnerd commented Jan 8, 2020

In short, GAN-based inpainting methods predict the user-marked regions based on image context.

In your given example (remove pimples from face images), with the input image and the annotated region, the model infers the semantics (e.g. skin, eye, etc.) and low-level details (e.g. color, texture, etc.) of the marked region from the unknown regions (context), and composes the final result. The prediction of the semantics (whether it contains pimples or not in your example) is from the given image context and the learned priors from the training set. Thus, the final predicted region is highly likely to be different from the corresponding one of the ground truth, as the final model optimization goal is to encourage the prediction to be realistic just as the training data.

For your mentioned ATTGAN, it has extra conditions as attribute vectors. With that, the GAN model can interactively control its prediction. Similar works or ideas can be referred to pixel2pixel and its following works.

About the quick understanding of the GAN-based inpainting methods, you can treat the GAN loss as a learnable metric. It measures how real the prediction is (compared with the real data) from a high-level perspective (similar to human observers) instead of requiring pixel-wise similarities like L1 or L2 loss.

@qwerdbeta
Copy link

I have a similar problem but instead of making the images and masks smaller, I made them 512x512 (images) and 256x256 (masks) and I get a similar error with different numbers reported. See #61

@qwerdbeta
Copy link

above @nywang2019 says he is using 512x512 training images but in the trace it says img sizes are 256x256. @shepnerd what would be needed to get the pretrain step to work with 512x512 training images?

I get similar errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants