-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Convert limit NF4 conversion to FP16 -> NF4 #23806
[core] Convert limit NF4 conversion to FP16 -> NF4 #23806
Conversation
NF4 <-> FP
@AlexKoff88 && @itikhono && @vurusovs |
struct Evaluate : public element::NoAction<bool> { | ||
using element::NoAction<bool>::visit; | ||
|
||
template <element::Type_t ET_IN, class TI = fundamental_type_for<ET_IN>> | ||
// convert from FP (except NF4) to any other. | ||
template <element::Type_t ET_IN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put a comment in the relevant ticket. In a nutshell, it would be better to support FP16->NF4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes applied to support FP16 -> NF4. Required to remove some test as they use not supported conversion by convert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can we update the test instead of removing it to make sure that it works on the Python API level which is important for external users such as NNCF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored tests for python API and CPU. It looks like conversion from NF4 to F16 is required to make possible of decompression of constants (other types by adding additional convert).
But as CPU plugin on not ARM device always converts F16 to F32, I think would be better to add support in Convert to decompress NF4 to F32 directly without adding additional conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a temporary situation. CPU will use BF16, INT8, of FP32 to unpack this type and it should be done on a micro-kernel level (not reference implementation). The reference implementation can have any type. My only concern is to provide an efficient way for compression from FP16/FP32 to NF4. Ideally, we should support both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KodiaqQ mention in ticket that these changes makes no regression.
I think PR can be merged, @AlexKoff88 can you confirm?
in convert operator
47adcb2
to
f8ce4eb
Compare
supports conversions not handled by core op
- convert requires support NF4 <-> F16
…t-to-f32-nf4
Improve test accuracy
…3806) ### Details: - Limit NF4 conversion to FP16 -> NF4 in convert operator as values are correctly quantized, Not support NF4 conversion to/from other types. ### Tickets: - [CVS-135304](https://jira.devtools.intel.com/browse/CVS-135304)
Details:
Tickets: