This repository has been archived by the owner on Aug 7, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use mm in subclass #128
Use mm in subclass #128
Changes from all commits
9d7de42
7f8610f
e1809e0
25d0371
8cbb83f
4c51207
9b1cffe
952f6c1
007e00d
9770785
10b6218
88d3bb5
49b4985
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reviewing the subclass changes but probably not the right person to review this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in a previous PR and just moved it to the Mixin so that it can be added to the TP stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned offline but food for thought that I'll mention here: it would be interesting to think about what it would take to have
aten.matmul(Float8Tensor, Float8Tensor)
actually return anotherFloat8Tensor
, and then leave it to the subclass to know to upcast on future ops that don't want to handle Float8 directly.My understanding was:
(1) This is a pain mostly because the extra buffers for float8 live directly on the
Float8Linear
nn module today and not the subclass (probably for good reason)(2) Doing this would provide benefit if we want to start increasing the number of ops that directly handle float8, but all we care about is linear then this generality is probably not very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also just thinking - should emulate just be a global config somewhere, instead of a flag that you have to plumb around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about this with Brian offline. This is probably right, but I am going to do this in a followup. I also want to see if when get plain torch.nn.fucntional.linear in the LinearFloat8 and will do some matmul changes