Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

split_and_load can now handle num_ctx > num_data. Github Issue #13909 #14607

Merged
merged 1 commit into from
Apr 8, 2019
Merged

split_and_load can now handle num_ctx > num_data. Github Issue #13909 #14607

merged 1 commit into from
Apr 8, 2019

Conversation

mightydeveloper
Copy link
Contributor

@mightydeveloper mightydeveloper commented Apr 3, 2019

Description

Handles Issue #13909
when last batch size is smaller than the number of contexts,
the previous utility function gluon.utils.split_and_load throws an exception ValueError: Too many slices for data with shape ....
However, we can just put one data per context and ignore the remaining contexts.
This integrates nicely with the given reproducing example code. (User does not need to modify the code)

for d, l in zip(data, label):
    with autograd.record():
        out = net(d)
        losses.append(loss_fn(out, l))
for loss in losses:
    loss.backward()

where data and label will only output data that exists in the first few of the given context.
the forward / backward pass is only done in the contexts where needed. (remaining contexts does not need to do a fake forward/backward pass)
My concern is people can make mistakes when calculating mean of the losses.

So it would be nice if we add this behavior to the documentation. (split_and_load can output a list that has a size less than number of contexts when even_split=False)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.
-> I believe this PR is just tiny change

@mightydeveloper mightydeveloper requested a review from szha as a code owner April 3, 2019 07:34
@mightydeveloper
Copy link
Contributor Author

Just an FYI, I made a misleading comment on the code previously, so I fixed it.

@wkcn
Copy link
Member

wkcn commented Apr 3, 2019

I am not sure how the weights update for the context without inpus when backward.

@mightydeveloper
Copy link
Contributor Author

mightydeveloper commented Apr 3, 2019

@wkcn
In the example above, losses only contains loss terms that are forwarded by real data samples. (meaning that unnecessary(possibly fake) context loss are not appended from the previous for loop.)
So I believe when we do loss.backward(), only the contexts that are appended from the previous for loop will run and calculate gradients for previously marked variables from previous for loop.

For example, suppose we have 3 examples left from the dataset and have 5 GPU contexts.
We would only call losses.append(loss_fn(out, l)) 3 times, marking variables in 0, 1, 2 GPU contexts.
So when we call

for loss in losses:
    loss.backward()

only gradients for GPU 0, 1, 2 will be calculated and when we eventually call trainer.step(), the weights will be updated.
Does this answer help you? (I might have misunderstood your concern)

(+ Also, I noticed that I have an indentation error in my original PR request description so I just edited it.)

@wkcn
Copy link
Member

wkcn commented Apr 3, 2019

@mightydeveloper
Thank you for the detailed explanation!
Yes, only gradients for GPU 0, 1, 2 will be calculated.

However, it seems that trainer.step() will update the weights by using all gradients on 5 GPU contexts.
The gradients on GPU 3, 4 may be not zero-tensors.

The code of trainer.step()

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM : )
Thanks for your contribution!

@mightydeveloper
Copy link
Contributor Author

However, it seems that trainer.step() will update the weights by using all gradients on 5 GPU contexts.
The gradients on GPU 3, 4 may be not zero-tensors.

Thanks for checking and providing relevant code link!

So... If I want to zero out gradients, I guess I should either

  1. call trainer._params.zero_grad()
  2. or just net.collect_params().zero_grad()
    (Assuming that I initialized trainer with Trainer(net.collect_params(), ...)
  3. or maybe call trainer.step(ignore_stale_grad=True) instead?

Would all three of them work? which one would be better?

@wkcn
Copy link
Member

wkcn commented Apr 3, 2019

calling trainer.step(ignore_stale_grad=True) is okay : )

@piyushghai
Copy link
Contributor

@mightydeveloper Thanks for your contributions.

@mxnet-label-bot Add [Gluon]

@wkcn wkcn added the pr-awaiting-review PR is waiting for code review label Apr 7, 2019
@wkcn wkcn merged commit daabe5c into apache:master Apr 8, 2019
@wkcn
Copy link
Member

wkcn commented Apr 8, 2019

Merged. Thanks for your contribution!

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gluon pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants