Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microTVM] Replace arm_nnsupportfunctions.h with arm_acle.h #13363

Merged
merged 6 commits into from
Jan 9, 2023

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Nov 11, 2022

This attempts to replace the CMSIS-NN header with a more portable alternative and avoid dependence on CMSIS

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 11, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@Mousius
Copy link
Member Author

Mousius commented Nov 23, 2022

The include is removed in #13242, so we can wait for that to land 😸

@Mousius Mousius closed this Nov 23, 2022
@Mousius Mousius reopened this Nov 23, 2022
@Mousius Mousius force-pushed the acle-experiment branch 2 times, most recently from 183ee05 to f98eb78 Compare November 25, 2022 21:39
The packing definitions aren't implemented as ACLE intrinsics nor is there a simple way to convince a C compiler to generate them.
@mehrdadh
Copy link
Member

mehrdadh commented Jan 4, 2023

@Mousius any update on this?

@ashutosh-arm
Copy link
Contributor

@Mousius any update on this?

We have started looking at it. I am able to reproduce the hang locally: test_arm_mprofile_dsp.py::test_conv1d[int16-data_shape_nwc2-3-5-1-0] that is failing in the CI.

Introduce `memcpy` to explain to the compiler that we're changing
the alignment of `int16_t` to `int32_t`. What this appears to actually
do is encourage the compiler to use three loads rather than one double
load plus a regular load.

The padded array is aligned as an `int16_t`, it isn't guaranteed to
behave like an `int32_t` aligned array. One of the side effects of the
type punning from `int16_t*` to `int32_t*` is that we're effectively
lying to the compiler that this is correctly aligned and it can use
instructions which load multiple `int32_t`s at the same time - this does
not work 😿

Co-authored-by: Ashutosh Parkhi <[email protected]>
@Mousius Mousius marked this pull request as ready for review January 6, 2023 18:39
@Mousius
Copy link
Member Author

Mousius commented Jan 6, 2023

@mehrdadh I've pushed up a fix for the hanging, it addresses some of the type punning withmemcpy 😸 see 9af76f0 for more details

Copy link
Contributor

@ashutosh-arm ashutosh-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @Mousius! That makes way for executing the generated code independent of CMSIS-NN.

@ashutosh-arm ashutosh-arm merged commit 6bc72bb into apache:main Jan 9, 2023
@Mousius Mousius deleted the acle-experiment branch January 9, 2023 16:30
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
…3363)

* [microTVM] Replace arm_nnsupportfunctions.h with arm_acle.h

This attempts to replace the CMSIS-NN header with a more portable
alternative and avoid dependence on CMSIS

* Remove CMSIS __STATIC_FORCEINLINE macro

* Replace more intrinsics with ACLE variants

* Use builtins for intrinsics missing in older GCC

* Re-use common_includes to propagate shared functions

The packing definitions aren't implemented as ACLE intrinsics nor is there a simple way to convince a C compiler to generate them.

* Properly align memory access for

Introduce `memcpy` to explain to the compiler that we're changing
the alignment of `int16_t` to `int32_t`. What this appears to actually
do is encourage the compiler to use three loads rather than one double
load plus a regular load.

The padded array is aligned as an `int16_t`, it isn't guaranteed to
behave like an `int32_t` aligned array. One of the side effects of the
type punning from `int16_t*` to `int32_t*` is that we're effectively
lying to the compiler that this is correctly aligned and it can use
instructions which load multiple `int32_t`s at the same time - this does
not work 😿

Co-authored-by: Ashutosh Parkhi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants