Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Souping on regression model leading to a drastic drop in accuracy #13

Open
akhilperincherry opened this issue Aug 4, 2023 · 8 comments

Comments

@akhilperincherry
Copy link

akhilperincherry commented Aug 4, 2023

Hello,

I have a regression model that I composed by taking a MobileNet classifier (pre-trained with ImageNet weights), then removing its classification head and adding a flatten+dense layer that spits out a scalar output. I define an accuracy metric based on if the absolute error is below a threshold.

I take the above model and train it first using LP for 15 iterations, then using FT for 2 iterations. This is my starter model. This starter model was trained using RMSprop.

I then take this starter model, and train it (using LP) for a variable number of iterations, variable learning rate, variable optimizer types (RMSprop, Adam, AdamW), variable seeds to get my soup ingredient models.

I get approximately 91% accuracy on a held-out test using the starter model, 93% and 94% using two of my ingredient models.

Issue: I take a random pair of well performing models (>90%) amongst my starter and ingredient models, and average their weights. However, almost always the souped models have an accuracy of 2% on the test set.

Illustrative code I use to average the weights:

def uniform_soup(model_list):
    soups = []
    
    tf.keras.backend.clear_session()
    model_init = create_skeleton_model() #Any model from my starter or ingredients just for its architecture.
    
    for model_individual in model_list:
                
        soup = [np.array(weights) for weights in model_individual.weights]
        soups.append(soup)
         
    mean_soup = np.array(soups).mean(axis = 0)
    
    ## Replacing model's weight with Unifrom Soup Weights
    for w1, w2 in zip(model_init.weights, mean_soup ):
        tf.keras.backend.set_value(w1, w2)
        
    return model_init 
  • Is there anything wrong in my design or anything that stands out to you?
  • Is it okay to use a regression model? Does anything in the loss landscape change owing to it being a regression model?

I did peruse through #10 and followed your advice on that thread to design my souping.

Thanks in advance.

@mitchellnw
Copy link
Contributor

We have not tried regression models.. but I don't really see why that wouldn't work. I'm confused by this:

I get approximately 91% accuracy on a held-out test using the starter model, 93% and 94% using two of my ingredient models.

Does this mean that souping two models works but more does not?

@akhilperincherry
Copy link
Author

akhilperincherry commented Aug 4, 2023

No, sorry, I meant to say the models by themselves individually have good performance; starter model by itself results in 91% accuracy on a test-set, and ingredient models by themselves also have good individual performance of 93 and 94 for 2 of my ingredient models. However, when I soup them, it falls to 2.xx %. No souping works in my experiment, either with 2 models or more (Most of my experiments were souping with two ingredient models).

Another aspect is the MobileNet model is a much smaller network than the ones used in the paper. The paper did say to expect marginal performance increases with smaller ImageNet based models but seeing this drastic a drop tells me there is fundamentally something wrong in maybe my design?

@mitchellnw
Copy link
Contributor

Hmmm. Are you introducing new params when fine-tuning? What LR?

@akhilperincherry
Copy link
Author

No new parameters.

The starter model was trained at a LR of 0.005+RMSProp. The 7 ingredient models for souping were trained at LRs of {0.001+Adam, 0.005+Adam, 0.001+AdamW, 1e-05+AdamW, 0.0005+RMSProp, 2e-05+RMSProp, 0.001+AdamW}.

@mitchellnw
Copy link
Contributor

Can you try just souping the small LR models, e.g., 1e-05+AdamW and 2e-05+RMSProp. I think the LR may just be too high for the other models

@akhilperincherry
Copy link
Author

akhilperincherry commented Aug 21, 2023

Thanks for the suggestion. I took the two models you mentioned (call them m1, m2) and I also took their starter model s0.

Souping s0, m2 -> 3.26%
Souping s0, m1 -> 9.54%
Souping m1, m2 -> 5.70%
Souping s0, m1, m2 -> 4.50%

The values look better than what I've seen before (~2%) but still pretty bad overall. I also see that their range of predictions have reduced i.e. the individual models can predict values that range from 0 - 180 whereas the souped models' o/p ranges are much smaller for instance 30 - 90. I wonder if this is to do with a reduced representative ability?

@mitchellnw
Copy link
Contributor

mitchellnw commented Aug 21, 2023

Hmm. I really don't know. I guess souping + regression may be an open problem. Sorry about that.

@akhilperincherry
Copy link
Author

No worries, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants