Souping on regression model leading to a drastic drop in accuracy #13

akhilperincherry · 2023-08-04T01:56:39Z

Hello,

I have a regression model that I composed by taking a MobileNet classifier (pre-trained with ImageNet weights), then removing its classification head and adding a flatten+dense layer that spits out a scalar output. I define an accuracy metric based on if the absolute error is below a threshold.

I take the above model and train it first using LP for 15 iterations, then using FT for 2 iterations. This is my starter model. This starter model was trained using RMSprop.

I then take this starter model, and train it (using LP) for a variable number of iterations, variable learning rate, variable optimizer types (RMSprop, Adam, AdamW), variable seeds to get my soup ingredient models.

I get approximately 91% accuracy on a held-out test using the starter model, 93% and 94% using two of my ingredient models.

Issue: I take a random pair of well performing models (>90%) amongst my starter and ingredient models, and average their weights. However, almost always the souped models have an accuracy of 2% on the test set.

Illustrative code I use to average the weights:

def uniform_soup(model_list):
    soups = []
    
    tf.keras.backend.clear_session()
    model_init = create_skeleton_model() #Any model from my starter or ingredients just for its architecture.
    
    for model_individual in model_list:
                
        soup = [np.array(weights) for weights in model_individual.weights]
        soups.append(soup)
         
    mean_soup = np.array(soups).mean(axis = 0)
    
    ## Replacing model's weight with Unifrom Soup Weights
    for w1, w2 in zip(model_init.weights, mean_soup ):
        tf.keras.backend.set_value(w1, w2)
        
    return model_init

Is there anything wrong in my design or anything that stands out to you?
Is it okay to use a regression model? Does anything in the loss landscape change owing to it being a regression model?

I did peruse through #10 and followed your advice on that thread to design my souping.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

mitchellnw · 2023-08-04T17:45:38Z

We have not tried regression models.. but I don't really see why that wouldn't work. I'm confused by this:

I get approximately 91% accuracy on a held-out test using the starter model, 93% and 94% using two of my ingredient models.

Does this mean that souping two models works but more does not?

akhilperincherry · 2023-08-04T18:24:17Z

No, sorry, I meant to say the models by themselves individually have good performance; starter model by itself results in 91% accuracy on a test-set, and ingredient models by themselves also have good individual performance of 93 and 94 for 2 of my ingredient models. However, when I soup them, it falls to 2.xx %. No souping works in my experiment, either with 2 models or more (Most of my experiments were souping with two ingredient models).

Another aspect is the MobileNet model is a much smaller network than the ones used in the paper. The paper did say to expect marginal performance increases with smaller ImageNet based models but seeing this drastic a drop tells me there is fundamentally something wrong in maybe my design?

mitchellnw · 2023-08-13T22:36:23Z

Hmmm. Are you introducing new params when fine-tuning? What LR?

akhilperincherry · 2023-08-14T17:41:17Z

No new parameters.

The starter model was trained at a LR of 0.005+RMSProp. The 7 ingredient models for souping were trained at LRs of {0.001+Adam, 0.005+Adam, 0.001+AdamW, 1e-05+AdamW, 0.0005+RMSProp, 2e-05+RMSProp, 0.001+AdamW}.

mitchellnw · 2023-08-14T20:20:26Z

Can you try just souping the small LR models, e.g., 1e-05+AdamW and 2e-05+RMSProp. I think the LR may just be too high for the other models

akhilperincherry · 2023-08-21T21:44:19Z

Thanks for the suggestion. I took the two models you mentioned (call them m1, m2) and I also took their starter model s0.

Souping s0, m2 -> 3.26%
Souping s0, m1 -> 9.54%
Souping m1, m2 -> 5.70%
Souping s0, m1, m2 -> 4.50%

The values look better than what I've seen before (~2%) but still pretty bad overall. I also see that their range of predictions have reduced i.e. the individual models can predict values that range from 0 - 180 whereas the souped models' o/p ranges are much smaller for instance 30 - 90. I wonder if this is to do with a reduced representative ability?

mitchellnw · 2023-08-21T21:46:44Z

Hmm. I really don't know. I guess souping + regression may be an open problem. Sorry about that.

akhilperincherry · 2023-08-21T23:07:42Z

No worries, thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Souping on regression model leading to a drastic drop in accuracy #13

Souping on regression model leading to a drastic drop in accuracy #13

akhilperincherry commented Aug 4, 2023 •

edited

Loading

mitchellnw commented Aug 4, 2023

akhilperincherry commented Aug 4, 2023 •

edited

Loading

mitchellnw commented Aug 13, 2023

akhilperincherry commented Aug 14, 2023

mitchellnw commented Aug 14, 2023

akhilperincherry commented Aug 21, 2023 •

edited

Loading

mitchellnw commented Aug 21, 2023 •

edited

Loading

akhilperincherry commented Aug 21, 2023

Souping on regression model leading to a drastic drop in accuracy #13

Souping on regression model leading to a drastic drop in accuracy #13

Comments

akhilperincherry commented Aug 4, 2023 • edited Loading

mitchellnw commented Aug 4, 2023

akhilperincherry commented Aug 4, 2023 • edited Loading

mitchellnw commented Aug 13, 2023

akhilperincherry commented Aug 14, 2023

mitchellnw commented Aug 14, 2023

akhilperincherry commented Aug 21, 2023 • edited Loading

mitchellnw commented Aug 21, 2023 • edited Loading

akhilperincherry commented Aug 21, 2023

akhilperincherry commented Aug 4, 2023 •

edited

Loading

akhilperincherry commented Aug 4, 2023 •

edited

Loading

akhilperincherry commented Aug 21, 2023 •

edited

Loading

mitchellnw commented Aug 21, 2023 •

edited

Loading