-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TFLite] Convert TFLite NCHW to NHWC #3141
Conversation
ping @srkreddy1238 |
@FrozenGene overall look good to me. Trigger the CI and I will have another look. |
@srkreddy1238 CI is green now. |
@srkreddy1238 Could you help to have a look now? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FrozenGene . LGTM.
I tried to compile the model for ARM Mali GPU. The compilation failed -
|
The same issue exists on ARM CPU, e.g.
|
@apivovarov As described this thread, this PR will affect the Auto Tuning. We should add |
Why this PR was merged before adding |
This PR doesn't affect functionality, we could run generic / x86 CPU target well. If you compile Tensorflow model (not TFLite model) on ARM CPU, you will also get the same result. Please see the background discussion link of this PR (#2519), I have listed the advantages / disadvantages. I know your concern and this is why I support TFLite in NCHW layout firstly, but however, make TFLite frontend be NHWC is also discussed before and we know what it will be affected. Support |
@apivovarov I don't think we should add |
@FrozenGene Will data-related transpose op only appear at the beginning and the end of the network? I think if we can just keep the necessary layout transformations, the performance shouldn't be affected. This is probably a more maintainable way than adding another layout optimization in topi and autotvm. |
I noticed that in But in |
@kevinthesun Not. It will appear before and after every convolution op. Because like our Tensorflow's implementation, we just simply insert @apivovarov The reason is NCHW is better for CUDA target. |
@FrozenGene Got it. In the long term we can add a graph level layout transformation pass. For now we can replace conv2d NHWC with conv2d NCHW in conv2d_later_layout. This should achieve better performance than originla implementation. |
However, this will affect auto tuning. And in fact, we have |
* Convert TFLite NCHW to NHWC * Minor comment fix
* Convert TFLite NCHW to NHWC * Minor comment fix
@FrozenGene @apivovarov any update for NHWC support for ARM CPU? And I think we need more examples about tflite for ARM CPU, like some network in arm_cpu_imagenet_bench.py. |
@chenbiaolong I haven't spare time doing this currently. I wish someone could help to it, I could help to review it. I think it is not difficult. Could refer https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py#L212-L318 how to do it. |
CC @ajtulloch if you have any interest and time. |
@FrozenGene has anyone followed up on NHWC AutoTVM templates for x86/ARM? If not I can give that a stab. |
Hi @tmoreau89 , my colleague @jackwish is working on arm and will pr soon. You are welcome to do x86. :-) |
Hi @tmoreau89 and @FrozenGene Drafted the ARM NHWC schedule PR #3859. |
Thank you Thierry, I have seen it :)
Thierry Moreau <[email protected]> 于2019年8月30日周五 下午12:30写道:
… Nice, thanks for sharing your draft @jackwish
<https://github.com/jackwish> . I have given x86 support a try in this
WIP PR: #3858 <#3858>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3141>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABFVHDOYWJYEHC2ORS2PUYTQHCO6PANCNFSM4HLKMLYA>
.
|
Background RFC: #2519
As discussed in above RFC, we agree to make TFLite frontend input layout from NCHW to NHWC. This PR does this.
Affected:
We could not use Auto TVM to tuning on ARM CPU. Because ARM CPU schedule only implement NCHW currently. We should add SpatialPack + NHWC for conv2d / depthwise convolution on ARM CPU.
@srkreddy1238
cc: @songqun @gomida @ariwaranosai @tqchen @yzhliu