-
Notifications
You must be signed in to change notification settings - Fork 20
[bc breaking] change x, w, dL_dY variable names to input, weight, grad_output #323
Conversation
…d_output Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…d_output Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5d1c7063d98ef2adebdb14b1757cc349d41e3020 Pull Request resolved: #323
…weight, grad_output" Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…d_output Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 52683b06c816b8ab1cdef5142d5a7eaea1a9e0f2 Pull Request resolved: #323
@@ -23,4 +23,7 @@ jobs: | |||
pip install black==23.3.0 usort==1.0.6 ufmt==2.1.0 libcst==1.0.1 | |||
- name: Analyzing the code with ufmt | |||
run: | | |||
ufmt format . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L26:28 is for easier debugging of differences between local machine and CI ufmt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the ufmt config different than CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed this on a couple of PRs in the past. A better fix would be to align ufmt versions + env, but regardless this is useful for debugging.
…weight, grad_output" Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…d_output Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d1d6ffbcc37bcf1b39709e50156d6018120a2261 Pull Request resolved: #323
…weight, grad_output" Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…d_output Summary: The following naming scheme matches the rest of PyTorch better: ``` // forward output = input @ weight_t // backward grad_input = grad_output @ weight grad_weight = input_t @ grad_output ``` This PR changes all the previous references to `x`, `w`, `dL_dY` to match the naming scheme above. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 37a6a51987b55cd894b70fa54e6b8669f48ccb47 Pull Request resolved: #323
ctx.save_for_backward(fp8_amax_dL_dY, fp8_amax_history_dL_dY, fp8_scale_dL_dY) | ||
ctx.save_for_backward( | ||
fp8_amax_grad_output, fp8_amax_history_grad_output, fp8_scale_grad_output | ||
) | ||
ctx.scale_fn_name = scale_fn_name | ||
ctx.is_amax_initialized = is_amax_initialized | ||
ctx.linear_mm_config = linear_mm_config | ||
return tensor | ||
|
||
@staticmethod | ||
def backward(ctx, go): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update go? This one has confused me in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR only contains the user-facing part since that's the most crucial. LOC is already high, so IMO better to rename the non-user facing things in subsequent PRs, to keep things smaller and more reviewable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request has been merged in 603efc2. |
Summary: In #323 we changed the user facing variable notation from `x/w/dL_dY` to `input/weight/grad_output`. This PR follows up by changing most of the internal variables to also match the new notation, to reduce confusion. Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…rad_output notation" Summary: In #323 we changed the user facing variable notation from `x/w/dL_dY` to `input/weight/grad_output`. This PR follows up by changing most of the internal variables to also match the new notation, to reduce confusion. Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…ion" Summary: In #323 we changed the user facing variable notation from `x/w/dL_dY` to `input/weight/grad_output`. This PR follows up by changing most of the internal variables to also match the new notation, to reduce confusion. Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Pull Request resolved: #335 In #323 we changed the user facing variable notation from `x/w/dL_dY` to `input/weight/grad_output`. This PR follows up by changing most of the internal variables to also match the new notation, to reduce confusion. Reviewed By: weifengpy Differential Revision: D60252071 fbshipit-source-id: b91ec5b975df550962418eafc93f1904d64a3dd8
Stack from ghstack (oldest at bottom):
Summary:
The following naming scheme matches the rest of PyTorch better:
This PR changes all the previous references to
x
,w
,dL_dY
tomatch the naming scheme above.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D60072596