-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Souping on regression model leading to a drastic drop in accuracy #13
Comments
We have not tried regression models.. but I don't really see why that wouldn't work. I'm confused by this:
Does this mean that souping two models works but more does not? |
No, sorry, I meant to say the models by themselves individually have good performance; starter model by itself results in 91% accuracy on a test-set, and ingredient models by themselves also have good individual performance of 93 and 94 for 2 of my ingredient models. However, when I soup them, it falls to 2.xx %. No souping works in my experiment, either with 2 models or more (Most of my experiments were souping with two ingredient models). Another aspect is the MobileNet model is a much smaller network than the ones used in the paper. The paper did say to expect marginal performance increases with smaller ImageNet based models but seeing this drastic a drop tells me there is fundamentally something wrong in maybe my design? |
Hmmm. Are you introducing new params when fine-tuning? What LR? |
No new parameters. The starter model was trained at a LR of 0.005+RMSProp. The 7 ingredient models for souping were trained at LRs of {0.001+Adam, 0.005+Adam, 0.001+AdamW, 1e-05+AdamW, 0.0005+RMSProp, 2e-05+RMSProp, 0.001+AdamW}. |
Can you try just souping the small LR models, e.g., 1e-05+AdamW and 2e-05+RMSProp. I think the LR may just be too high for the other models |
Thanks for the suggestion. I took the two models you mentioned (call them m1, m2) and I also took their starter model s0. Souping s0, m2 -> 3.26% The values look better than what I've seen before (~2%) but still pretty bad overall. I also see that their range of predictions have reduced i.e. the individual models can predict values that range from 0 - 180 whereas the souped models' o/p ranges are much smaller for instance 30 - 90. I wonder if this is to do with a reduced representative ability? |
Hmm. I really don't know. I guess souping + regression may be an open problem. Sorry about that. |
No worries, thank you. |
Hello,
I have a regression model that I composed by taking a MobileNet classifier (pre-trained with ImageNet weights), then removing its classification head and adding a flatten+dense layer that spits out a scalar output. I define an accuracy metric based on if the absolute error is below a threshold.
I take the above model and train it first using LP for 15 iterations, then using FT for 2 iterations. This is my starter model. This starter model was trained using RMSprop.
I then take this starter model, and train it (using LP) for a variable number of iterations, variable learning rate, variable optimizer types (RMSprop, Adam, AdamW), variable seeds to get my soup ingredient models.
I get approximately 91% accuracy on a held-out test using the starter model, 93% and 94% using two of my ingredient models.
Issue: I take a random pair of well performing models (>90%) amongst my starter and ingredient models, and average their weights. However, almost always the souped models have an accuracy of 2% on the test set.
Illustrative code I use to average the weights:
I did peruse through #10 and followed your advice on that thread to design my souping.
Thanks in advance.
The text was updated successfully, but these errors were encountered: