Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Convert limit NF4 conversion to FP16 -> NF4 #23806

Merged
merged 8 commits into from
Apr 17, 2024

Conversation

praasz
Copy link
Contributor

@praasz praasz commented Apr 2, 2024

Details:

  • Limit NF4 conversion to FP16 -> NF4 in convert operator as values are correctly quantized, Not support NF4 conversion to/from other types.

Tickets:

NF4 <-> FP
@praasz praasz requested a review from a team as a code owner April 2, 2024 05:18
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Apr 2, 2024
@praasz praasz requested a review from a team as a code owner April 2, 2024 05:51
@praasz praasz requested review from itikhono and removed request for a team April 2, 2024 05:51
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Apr 2, 2024
@mlukasze
Copy link
Contributor

mlukasze commented Apr 3, 2024

@AlexKoff88 && @itikhono && @vurusovs
could you please review it, as we have green CI? Looks like useful change would be nice to have in a release

struct Evaluate : public element::NoAction<bool> {
using element::NoAction<bool>::visit;

template <element::Type_t ET_IN, class TI = fundamental_type_for<ET_IN>>
// convert from FP (except NF4) to any other.
template <element::Type_t ET_IN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put a comment in the relevant ticket. In a nutshell, it would be better to support FP16->NF4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes applied to support FP16 -> NF4. Required to remove some test as they use not supported conversion by convert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can we update the test instead of removing it to make sure that it works on the Python API level which is important for external users such as NNCF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored tests for python API and CPU. It looks like conversion from NF4 to F16 is required to make possible of decompression of constants (other types by adding additional convert).

But as CPU plugin on not ARM device always converts F16 to F32, I think would be better to add support in Convert to decompress NF4 to F32 directly without adding additional conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a temporary situation. CPU will use BF16, INT8, of FP32 to unpack this type and it should be done on a micro-kernel level (not reference implementation). The reference implementation can have any type. My only concern is to provide an efficient way for compression from FP16/FP32 to NF4. Ideally, we should support both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KodiaqQ mention in ticket that these changes makes no regression.
I think PR can be merged, @AlexKoff88 can you confirm?

@praasz praasz added this to the 2024.2 milestone Apr 3, 2024
@praasz praasz requested review from a team as code owners April 3, 2024 14:44
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: Python API OpenVINO Python bindings category: TEMPLATE OpenVINO Template plugin labels Apr 3, 2024
@praasz praasz changed the title [core] Convert limit NF4 conversion to NF4 <-> FP [core] Convert limit NF4 conversion to FP16 -> NF4 Apr 3, 2024
in convert operator
@praasz praasz force-pushed the convert-op-nf4-limit-to-f32-nf4 branch from 47adcb2 to f8ce4eb Compare April 3, 2024 15:40
@praasz praasz requested a review from AlexKoff88 April 4, 2024 04:38
praasz added 3 commits April 11, 2024 04:46
supports conversions not handled by core op
- convert requires support NF4 <-> F16
…t-to-f32-nf4
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Apr 11, 2024
src/core/src/op/convert.cpp Outdated Show resolved Hide resolved
Improve test accuracy
@praasz praasz added this pull request to the merge queue Apr 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 17, 2024
@praasz praasz added this pull request to the merge queue Apr 17, 2024
Merged via the queue into openvinotoolkit:master with commit d199994 Apr 17, 2024
110 checks passed
@praasz praasz deleted the convert-op-nf4-limit-to-f32-nf4 branch April 17, 2024 13:57
alvoron pushed a commit to alvoron/openvino that referenced this pull request Apr 29, 2024
…3806)

### Details:
- Limit NF4 conversion to FP16 -> NF4 in convert operator as values are
correctly quantized, Not support NF4 conversion to/from other types.

### Tickets:
 - [CVS-135304](https://jira.devtools.intel.com/browse/CVS-135304)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: Python API OpenVINO Python bindings category: TEMPLATE OpenVINO Template plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants