Merge branch 'master' into feature/add_colab_link

kendryte · Aug 30, 2023 · 93769df · 93769df
2 parents 58689cd + 6544a5c
commit 93769df
Show file tree

Hide file tree

Showing 22 changed files with 429 additions and 268 deletions.
diff --git a/docs/USAGE_v2.md b/docs/USAGE_v2.md
@@ -46,7 +46,15 @@ Type "help", "copyright", "credits" or "license" for more information.
 
 k230模型编译推理参考Jupyter脚本：[User_guide](../examples/user_guide/k230_simulate.ipynb)，脚本中包含了单输入和多输入的示例。
 
-如果在Docker中运行Jupyter脚本，可以参考[配置Jupyter lab](https://github.com/kunjing96/docker-jupyterlab#32-%E9%85%8D%E7%BD%AEjupyter-lab)进行配置。
+如果在Docker中运行Jupyter脚本，可以参考以下命令，之后在浏览器窗口打开即可。
+
+```shell
+docker run -it --rm --privileged=true -p 8889:8889 --name Kendryte -v `pwd`:/mnt -w /mnt ghcr.io/kendryte/k230_sdk  /bin/bash -c "/bin/bash
+
+pip install jupyterlab
+
+jupyter-lab --ip 0.0.0.0 --allow-root
+```
 
 在执行脚本之前需要根据自身需求修改以下内容：
 
@@ -153,6 +161,8 @@ subgraph A
     end
 
 ```
+##### 动态shape参数
+详见[动态shape参数说明](./shape_bucket.md)
 
 #### 代码示例
 

diff --git a/docs/USAGE_v2_EN.md b/docs/USAGE_v2_EN.md
@@ -46,7 +46,16 @@ Type "help", "copyright", "credits" or "license" for more information.
 
 Model compilation, inference for k230 can be found in the Jupyter script [User_guide](../examples/user_guide/k230_simulate.ipynb), this script contains single and multiple input examples.
 
-If you run Jupyter scripts in Docker, you can refer to [Configure Jupyter lab](https://github.com/kunjing96/docker-jupyterlab#32-%E9%85%8D%E7%BD%AEjupyter-lab) to configure them.
+If you run the Jupyter script in Docker, you can refer to the command and then open it in your browser.
+
+```shell
+docker run -it --rm --privileged=true -p 8889:8889 --name Kendryte -v `pwd`:/mnt -w /mnt ghcr.io/kendryte/k230_sdk  /bin/bash -c "/bin/bash
+
+pip install jupyterlab
+
+jupyter-lab --ip 0.0.0.0 --allow-root
+```
+
 
 You need to modify the following to suit your needs before executing the script:
 
@@ -154,6 +163,9 @@ subgraph A
 
 ```
 
+##### Dynamice shape args
+Refer to[Dynamic shape args description](./shape_bucket.md)
+
 #### Example
 
 ```python

diff --git a/docs/shape_bucket.md b/docs/shape_bucket.md
@@ -0,0 +1,48 @@
+# ShapeBucket使用说明
+
+ShapeBucket是针对动态shape的一种解决方案，会根据输入长度的范围以及指定的段的数量来对动态shape进行优化。该功能默认为false，需要打开对应的option才能生效，除了指定对应的字段信息，其他流程与编译静态模型没有区别。
+
+对应的不同CompileOptions中的字段
+
+| 字段名称                    | 类型                  | 是否必须 | 描述                                                            |
+| --------------------------- | --------------------- | -------- | --------------------------------------------------------------- |
+| shape_bucket_enable         | bool                  | 是       | 是否开启ShapeBucket功能，默认为False。在 `dump_ir=True`时生效 |
+| shape_bucket_range_info     | Dict[str, [int, int]] | 是       | 每个输入shape维度信息中的变量的范围，最小值必须大于等于1        |
+| shape_bucket_segments_count | int                   | 是       | 输入变量的范围划分为几段                                        |
+| shape_bucket_fix_var_map    | Dict[str, int]        | 否       | 固定shape维度信息中的变量为特定的值                             |
+
+## onnx
+
+在模型的shape中会有些维度为变量名字，这里以一个onnx模型的输入为例
+
+> tokens: int64[batch_size, tgt_seq_len]
+>
+> step: float32[seq_len, batch_size]
+
+对应这个输入有如下的配置
+
+```python
+shape_bucket_options = nncase.ShapeBucketOptions()
+shape_bucket_options.shape_bucket_enable = True
+shape_bucket_options.shape_bucket_range_info = {"seq_len":[1, 100], "tgt_seq_len":[1, 100]}
+shape_bucket_options.shape_bucket_segments_count = 2
+shape_bucket_options.shape_bucket_fix_var_map = {"batch_size" : 3}
+```
+
+shape的维度信息中存在seq_len，tgt_seq_len，batch_size这三个变量。首先是batch_size，虽然是变量的但实际应用的时候固定为3，因此在**fix_var_map**中添加batch_size = 3，在运行的时候会将这个维度固定为3。
+
+seq_len，tgt_seq_len两个是实际会发生改变的，因此需要配置这两个变量的实际范围，也就是**range_info**的信息。**segments_count**是实际分段的数量，会根据范围等分为几份，对应的编译时间也会相应增加几倍。
+
+## tflite
+
+tflite的模型与onnx不同，shape上暂未标注维度的名称，目前只支持输入中具有一个维度是动态的，并且名称统一配置为-1，配置方式如下
+
+```cpp
+shape_bucket_options = nncase.ShapeBucketOptions()
+shape_bucket_options.shape_bucket_enable = True
+shape_bucket_options.shape_bucket_range_info = {"-1":[1, 100]}
+shape_bucket_options.shape_bucket_segments_count = 2
+shape_bucket_options.shape_bucket_fix_var_map = {"batch_size" : 3}
+```
+
+配置完这些选项后整个编译的流程和静态shape一致。
diff --git a/python/nncase/__init__.py b/python/nncase/__init__.py
@@ -357,6 +357,10 @@ class CompileOptions:
     dump_asm: bool
     dump_ir: bool
     dump_dir: str
+    shape_bucket_enable: bool
+    shape_bucket_range_info: dict
+    shape_bucket_segments_count: int
+    shape_bucket_fix_var_map: dict
 
     def __init__(self) -> None:
 
@@ -375,6 +379,10 @@ def __init__(self) -> None:
         self.dump_asm = True
         self.dump_ir = False
         self.dump_dir = "tmp"
+        self.shape_bucket_enable = False
+        self.shape_bucket_range_info = {}
+        self.shape_bucket_segments_count = 2
+        self.shape_bucket_fix_var_map = {}
 
 
 class ShapeBucketOptions:

diff --git a/src/Native/include/nncase/kernels/kernel_utils.h b/src/Native/include/nncase/kernels/kernel_utils.h
@@ -17,6 +17,7 @@
 #include <cassert>
 #include <cmath>
 #include <cstddef>
+#include <nncase/kernels/stackvm/resize_image.h>
 #include <nncase/runtime/datatypes.h>
 #include <numeric>
 

diff --git a/src/Native/include/nncase/kernels/stackvm/resize_image.h b/src/Native/include/nncase/kernels/stackvm/resize_image.h
@@ -0,0 +1,94 @@
+/* Copyright 2019-2023 Canaan Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+#include <nncase/runtime/stackvm/opcode.h>
+
+using namespace nncase::runtime::stackvm;
+
+using get_coordinate_func_t = float (*)(float, float, float, float, float,
+                                        float);
+using get_nearest_pixel_func_t = int64_t (*)(float);
+
+get_coordinate_func_t get_coordinate_from_resized(
+    image_resize_transformation_mode_t coordinate_transform_mode);
+
+get_nearest_pixel_func_t
+get_nearest_pixel_from_origin(image_resize_nearest_mode_t nearest_mode);
+
+inline get_coordinate_func_t get_coordinate_from_resized(
+    image_resize_transformation_mode_t coordinate_transform_mode) {
+    switch (coordinate_transform_mode) {
+    case image_resize_transformation_mode_t::asymmetric:
+        return [](float x_resized, float x_scale, float, float, float, float) {
+            return x_resized * x_scale;
+        };
+    case image_resize_transformation_mode_t::pytorch_half_pixel:
+        return [](float x_resized, float x_scale, float length_resized, float,
+                  float, float) {
+            return length_resized > 1 ? (x_resized + 0.5f) * x_scale - 0.5f
+                                      : 0.0f;
+        };
+    case image_resize_transformation_mode_t::align_corners:
+        return [](float x_resized, float, float length_resized,
+                  float length_original, float, float) {
+            return length_resized == 1 ? 0
+                                       : x_resized * (length_original - 1) /
+                                             (length_resized - 1);
+        };
+    case image_resize_transformation_mode_t::tfcrop_and_resize:
+        return [](float x_resized, float, float length_resized,
+                  float length_original, float roi_start, float roi_end) {
+            auto orig =
+                length_resized > 1
+                    ? roi_start * (length_original - 1) +
+                          (x_resized * (roi_end - roi_start) *
+                           (length_original - 1)) /
+                              (length_resized - 1)
+                    : 0.5 * (roi_start + roi_end) * (length_original - 1);
+            return static_cast<float>(orig);
+        };
+    default: // "image_resize_transformation_mode_t::half_pixel"
+        return [](float x_resized, float x_scale, float, float, float, float) {
+            return ((x_resized + 0.5f) * x_scale) - 0.5f;
+        };
+    }
+}
+
+inline get_nearest_pixel_func_t
+get_nearest_pixel_from_origin(image_resize_nearest_mode_t nearest_mode) {
+    switch (nearest_mode) {
+    case image_resize_nearest_mode_t::round_prefer_ceil:
+        return [](float x_original) {
+            return static_cast<int64_t>(std::round(x_original));
+        };
+    case image_resize_nearest_mode_t::floor:
+        return [](float x_original) {
+            return static_cast<int64_t>(std::floor(x_original));
+        };
+    case image_resize_nearest_mode_t::ceil:
+        return [](float x_original) {
+            return static_cast<int64_t>(std::ceil(x_original));
+        };
+    default: // default is round_prefer_floor
+        return [](float x_original) {
+            // for half way cases prefer floor
+            if (x_original == static_cast<int64_t>(x_original) + 0.5f) {
+                return static_cast<int64_t>(std::floor(x_original));
+            }
+            return static_cast<int64_t>(std::round(x_original));
+        };
+    }
+}
diff --git a/src/Native/src/kernels/stackvm/optimized/opt_ops.h b/src/Native/src/kernels/stackvm/optimized/opt_ops.h
@@ -17,13 +17,13 @@
  */
 #pragma once
 #include <nncase/kernels/kernel_context.h>
+#include <nncase/kernels/kernel_utils.h>
 #include <nncase/runtime/datatypes.h>
 #include <nncase/runtime/error.h>
 #include <nncase/runtime/result.h>
 #include <nncase/runtime/stackvm/opcode.h>
 #include <nncase/tensor.h>
 #include <nncase/value.h>
-
 BEGIN_NS_NNCASE_KERNELS_MODULE(stackvm)
 namespace optimized {
 
@@ -111,6 +111,8 @@ NNCASE_API result<void> resize_nearest_neighbor(
     gsl::span<const size_t> in_shape, gsl::span<const size_t> in_strides,
     gsl::span<const size_t> out_strides, int32_t out_h, int32_t out_w,
     bool align_corners, bool half_pixel_centers,
+    get_coordinate_func_t get_coordinate_func,
+    get_nearest_pixel_func_t get_nearset_func,
     kernel_context &context) noexcept;
 
 NNCASE_API result<void>

diff --git a/src/Native/src/kernels/stackvm/optimized/resize_image.cpp b/src/Native/src/kernels/stackvm/optimized/resize_image.cpp
@@ -89,7 +89,10 @@ result<void> resize_nearest_neighbor_impl(
     const T *input, T *output, gsl::span<const size_t> in_shape,
     NNCASE_UNUSED gsl::span<const size_t> in_strides,
     NNCASE_UNUSED gsl::span<const size_t> out_strides, int32_t out_h,
-    int32_t out_w, bool align_corners, bool half_pixel_centers,
+    int32_t out_w, NNCASE_UNUSED bool align_corners,
+    NNCASE_UNUSED bool half_pixel_centers,
+    get_coordinate_func_t get_coordinate_func,
+    get_nearest_pixel_func_t get_nearset_func,
     NNCASE_UNUSED kernel_context &context) noexcept {
     auto scales = kernels::detail::get_resize_scales(in_shape, out_h, out_w,
                                                      align_corners);
@@ -110,15 +113,23 @@ result<void> resize_nearest_neighbor_impl(
             auto *output_ptr = begin_output_ptr + oc * out_image_size;
 
             for (int oy = 0; oy < out_h; oy++) {
-                auto in_y = kernels::detail::get_nearest_neighbor(
-                    oy, in_shape[2], height_scale, align_corners,
-                    half_pixel_centers);
+                auto iy = get_coordinate_func(oy, height_scale, out_h,
+                                              in_shape[2], 0, 0);
+                int64_t in_y = get_nearset_func(iy);
+                if (in_y < 0)
+                    in_y = 0;
+                if (in_y >= in_shape[2])
+                    in_y = in_shape[2] - 1;
                 auto *in_row = input_ptr + in_y * in_shape[3];
 
                 for (int ox = 0; ox < out_w; ox++) {
-                    auto in_x = kernels::detail::get_nearest_neighbor(
-                        ox, in_shape[3], width_scale, align_corners,
-                        half_pixel_centers);
+                    auto ix = get_coordinate_func(ox, width_scale, out_w,
+                                                  in_shape[3], 0, 0);
+                    int64_t in_x = get_nearset_func(ix);
+                    if (in_x < 0)
+                        in_x = 0;
+                    if (in_x >= in_shape[3])
+                        in_x = in_shape[3] - 1;
                     *output_ptr++ = in_row[in_x];
                 }
             }
@@ -264,10 +275,11 @@ inline result<void> resize_bilinear_impl(
                          half_pixel_centers, context);
 
 #define RESIZE_NEAREST_NEIGHBOR_IMPL(type)                                     \
-    resize_nearest_neighbor_impl(reinterpret_cast<const type *>(input),        \
-                                 reinterpret_cast<type *>(output), in_shape,   \
-                                 in_strides, out_strides, out_h, out_w,        \
-                                 align_corners, half_pixel_centers, context);
+    resize_nearest_neighbor_impl(                                              \
+        reinterpret_cast<const type *>(input),                                 \
+        reinterpret_cast<type *>(output), in_shape, in_strides, out_strides,   \
+        out_h, out_w, align_corners, half_pixel_centers, get_coordinate_func,  \
+        get_nearset_func, context);
 
 result<void> optimized::resize_bilinear(
     typecode_t type, const gsl::byte *input, gsl::byte *output,
@@ -283,6 +295,8 @@ result<void> optimized::resize_nearest_neighbor(
     gsl::span<const size_t> in_shape, gsl::span<const size_t> in_strides,
     gsl::span<const size_t> out_strides, int32_t out_h, int32_t out_w,
     bool align_corners, bool half_pixel_centers,
+    get_coordinate_func_t get_coordinate_func,
+    get_nearest_pixel_func_t get_nearset_func,
     kernel_context &context) noexcept {
     FP_OR_Q_IMPL(type, RESIZE_NEAREST_NEIGHBOR_IMPL);
 }
diff --git a/src/Native/src/kernels/stackvm/reference/instance_norm.cpp b/src/Native/src/kernels/stackvm/reference/instance_norm.cpp
@@ -35,10 +35,12 @@ result<void> instance_norm_impl(const float *input, const float *scale,
                                 float epsilon) {
     return apply(in_shape, [&](gsl::span<const size_t> index) -> result<void> {
         auto c = index[1];
+        auto offi = index[0] * in_shape[1] + index[1];
         auto off = offset(in_strides, index);
         const auto x = input[off];
         output[offset(out_strides, index)] =
-            scale[c] * (x - input_mean[c]) / std::sqrt(input_var[c] + epsilon) +
+            scale[c] * (x - input_mean[offi]) /
+                std::sqrt(input_var[offi] + epsilon) +
             bias[c];
         return ok();
     });

diff --git a/src/Native/src/kernels/stackvm/reference/ref_ops.h b/src/Native/src/kernels/stackvm/reference/ref_ops.h
@@ -18,6 +18,7 @@
 #pragma once
 #include <nncase/kernels/apply.h>
 #include <nncase/kernels/kernel_context.h>
+#include <nncase/kernels/kernel_utils.h>
 #include <nncase/runtime/datatypes.h>
 #include <nncase/runtime/error.h>
 #include <nncase/runtime/result.h>
@@ -345,6 +346,8 @@ NNCASE_API result<void> resize_nearest_neighbor(
     gsl::span<const size_t> in_shape, gsl::span<const size_t> in_strides,
     gsl::span<const size_t> out_strides, int32_t out_h, int32_t out_w,
     bool align_corners, bool half_pixel_centers,
+    get_coordinate_func_t get_coordinate_func,
+    get_nearest_pixel_func_t get_nearset_func,
     kernel_context &context) noexcept;
 
 NNCASE_API result<void> reverse_sequence(