Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about batch and subdivisions #1736

Open
ydixon opened this issue Oct 8, 2018 · 10 comments
Open

Question about batch and subdivisions #1736

ydixon opened this issue Oct 8, 2018 · 10 comments

Comments

@ydixon
Copy link

ydixon commented Oct 8, 2018

From this cfg, batch=64, subdivision=16. Therefore, real batch size should be 64 / 16 = 4.

My question is whether the gradients of 64 images are accumulated before updating the model or gradients of 4 images are updated every iteration?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Oct 8, 2018

The weights will be updated for each batch_cfg = 64.


mini_batch = net.batch = batch_cfg / subdivisions_cfg

darknet/src/network.c

Lines 315 to 326 in 24b6045

int batch = net.batch;
int n = d.X.rows / batch;
float *X = calloc(batch*d.X.cols, sizeof(float));
float *y = calloc(batch*d.y.cols, sizeof(float));
int i;
float sum = 0;
for(i = 0; i < n; ++i){
get_next_batch(d, batch, i*batch, X, y);
float err = train_network_datum(net, X, y);
sum += err;
}


float train_network_datum(network net, float *x, float *y) {

...

if(((*net.seen)/net.batch)%net.subdivisions == 0) update_network(net);

@ydixon
Copy link
Author

ydixon commented Oct 9, 2018

Thanks

@ydixon
Copy link
Author

ydixon commented Oct 15, 2018

@AlexeyAB Sorry for resurrecting this. If the condition to decide whether to update the network is if(((*net.seen)/net.batch)%net.subdivisions == 0) update_network(net);, doesn't it mean the weights will update every batch_cfg instead of mini-batch?

For example:

batch_cfg=64, subdivision_cfg=16
net.batch = batch_cfg / subdivision_cfg = 4
net.subdivisions = 16

Case 1: net.seen = 60

(net.seen/net.batch)%net.subdivisions = (60/4) % 4 = 3
Do not update

Case 2: net.seen = 128

(net.seen/net.batch)%net.subdivisions = (128/4) % 4 = 0
Update

@AlexeyAB
Copy link
Owner

@ydixon Yes, weights will be updated for each batch_cfg.

@ydixon
Copy link
Author

ydixon commented Oct 15, 2018

@AlexeyAB Thanks for the quick response!

@kmsravindra
Copy link

kmsravindra commented Nov 19, 2018

@AlexeyAB , Can I train with batch=128 so that the trained model is more generalized than when batch=64? And then in that case, maybe I will have to train for almost the double number of iterations than when batch=64? So, the batch size could be a hyper parameter impacting mAP... is my understanding correct?

@sctrueew
Copy link

@AlexeyAB Hi,

I have 200k images and about 200 classes and I have two GPUs RTX 2080 Ti. My model is Gaussian.cfg, I want to know what is the batch and subdivisions should I set?

Thanks is advance.

@AlexeyAB
Copy link
Owner

batch=64 subdivisions=16

the lower subdivisions the better.

@sctrueew
Copy link

@AlexeyAB Hi,

Thanks, Can I stop the training and change the subdivisions and continue the training again?

@AlexeyAB
Copy link
Owner

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants