Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Generate the Amalgamation for android with NNPACK #4419

Closed
ydmo opened this issue Dec 29, 2016 · 4 comments
Closed

Generate the Amalgamation for android with NNPACK #4419

ydmo opened this issue Dec 29, 2016 · 4 comments

Comments

@ydmo
Copy link

ydmo commented Dec 29, 2016

If I generate the amalgamation for android without other librarys, the running time of WhatsThis App is about 700ms.

However, when I generate Amalgamation with libnnpack.a or others librarys generate by ndk-build, the running time of WhatsThis became more than 6000ms, some times almost 7s.

here is what I have done:

  1. add these to the makefile in amalgamation:
    export NNPACK_ROOT=${MXNET_ROOT}/../nnpack/NNPACK
    #mxnet itself
    CFLAGS += -I${MXNET_ROOT} #mxnetroot
    CFLAGS += -I${MXNET_ROOT}/dmlc-core/include #mxnetroot/dmlc-core/include
    CFLAGS += -I${MXNET_ROOT}/include #mxnetroot/include
    CFLAGS += -I${MXNET_ROOT}/mshadow #mxnetroot/mshadow
    #nnpack:
    CFLAGS += -DMXNET_USE_NNPACK=1
    CFLAGS += -DMXNET_USE_NNPACK_NUM_THREADS=8
    CFLAGS += -I${NNPACK_ROOT}/include
    LDFLAGS += -L${NNPACK_ROOT}/obj/local/armeabi-v7a
    #LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lnnpack_reference -lgtest -lfp16_utils -lbench_utils -lcpufeatures
    LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lcpufeatures
    #nnpack dependence googletest:
    CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0/include
    CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0
    #nnpack dependence pthreadpool:
    CFLAGS += -I${NNPACK_ROOT}/third-party/pthreadpool/include
    CFLAGS += -I${NNPACK_ROOT}/third-party/FXdiv/include
    #other define used
    CFLAGS += -DMSHADOW_STAND_ALONE=1
    CFLAGS += -DMSHADOW_USE_CUDA=0
    CFLAGS += -DMSHADOW_USE_MKL=0
    CFLAGS += -DSHADOW_RABIT_PS=0
    CFLAGS += -DMSHADOW_DIST_PS=0
    CFLAGS += -DMSHADOW_USE_SSE=0
    CFLAGS += -DMXNET_USE_OPENCV=0
    CFLAGS += -DMXNET_PREDICT_ONLY=0
    CFLAGS += -DDISABLE_OPENMP=1

  2. add these to Amalgamation.py the line 123
    #if MXNET_USE_NNPACK == 1
    #include "src/operator/nnpack/nnpack_convolution-inl.h"
    #endif // MXNET_USE_NNPACK

  3. run
    make ANDROID=1
    and than get the jni_mxnet_predict.so, rename this and replace the one in WhatsThis, the running time became almost 7s

Who can help me to solve this?

@ajtulloch
Copy link

WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:

  1. Improve multithreaded performance via a pthreadpool rewrite.
  2. Improve 3x3 convolutions via dedicated NEON impl.

Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.

@tornadomeet
Copy link
Contributor

NNPACK in MXNet now is very limited, please try this pr: #4373 i have test it on PC, which will speed 2x~7x. thansk. @ydmo

@xlvector
Copy link
Contributor

xlvector commented Jan 5, 2017

@ydmo I use same method as you, and modify whatisthis to VGG network and do image segmentation task. I can give 5x speedup.

@yajiedesign
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants