Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trained model does not work #73

Open
urtepuod opened this issue Jan 22, 2024 · 2 comments
Open

Trained model does not work #73

urtepuod opened this issue Jan 22, 2024 · 2 comments

Comments

@urtepuod
Copy link

Hello, I've tried to train a new model from scratch using these settings: omnipose --train --use_gpu --dir "/home/urte/3D modeller/3d_cell_detector/trainingdata/Omni_5"
--img_filter '' --mask_filter _cp_masks
--pretrained_model None
--diameter 0 --nclasses 3 --nchan3 --tyx 512,512
--learning_rate 0.1 --RAdam --batch_size 5 --n_epochs 900 --save_every 300 --verbose
The training is successful, however if I try to import the model into the GUI, I get this error:
2024-01-22 11:45:23,186 [INFO] ** TORCH GPU version installed and working. **
2024-01-22 11:45:23,188 [INFO] >>>> using GPU
ERROR: Error(s) in loading state_dict for CPnet:
size mismatch for downsample.down.res_down_0.conv.conv_0.0.weight: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.conv.conv_0.0.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.conv.conv_0.0.running_mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.conv.conv_0.0.running_var: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.conv.conv_0.2.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 1, 3, 3]).
size mismatch for downsample.down.res_down_0.proj.0.weight: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.proj.0.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.proj.0.running_mean: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.proj.0.running_var: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([1]).
size mismatch for downsample.down.res_down_0.proj.1.weight: copying a param with shape torch.Size([32, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 1, 1, 1]).
size mismatch for output.2.weight: copying a param with shape torch.Size([4, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 32, 1, 1]).
size mismatch for output.2.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([3]).
I had tried training with nchan 2, nclases 3, but during training it will automatically reset to nchan 3. I have grayscale images with masks produced from cellpose in my training data. I apologise if this is trivial, however this is all very new to me.

@kevinjohncutler
Copy link
Owner

@urtepuod sorry for the delay! You may want to email me at [email protected] to debug further. I'd like to get your model and an example image to debug. I also need your pip list. I usually see this issue when cellpose has not been fully uninstalled or if we are just working with an older version of cellpose_omni and omnipose. In the most recent version of the GUI, you can choose nchan and select "boundary field output" if you trained with nclasses 3. If you images are grayscale, however, I think the model should have been trained with no channels (but RGB grayscale could have messed that up).

@marieanselmet
Copy link

Hello, I have the same problem here. If I train an omnipose model with nclasses = 4, I need to specify nclasses = 4 for the inference when using this model, otherwise I obtain the same error as above since by default nclasses = 2 now. Why this choice for the default value of nclasses ? How much it would impair the performance when training an omnipose model to loose 2 output branches ?
Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants