Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FRONTEND][TFLITE] Add support for TFLite's regular NMS operator #15117

Merged
merged 11 commits into from
Jul 31, 2023

Conversation

ilyag-grovety
Copy link
Contributor

This PR adds support of regular NMS operator.

Open questions:

  1. How to properly test added functionality?
    Other NMS implementations, e.g., fast NMS, use a TF frozen graph from TF official website to convert a model to TFLite and keep NMS operations only. In order to create a similar test, we need to find an archive on TF official website that contains a frozen graph of a model compiled with --use-regular-nms=True flag. We haven't found it yet, so any help is appreciated.
  2. Regular NMS requires two sort operations:
    • Sorting the scores after selecting scores above nms_score_threshold. This PR implements this with a simple bubble sort in order to prove the algorithm's semantics. We tried to replace it with tvm.contrib.sort.argsort. It works well when testing the regular NMS with run_tvm_graph as fast NMS test does or building and running the regular NMS with llvm target. At the same time, it fails to build (error is provided below) when target=ethos-u,cmsis-nn,c. It seems that __tvm_module_ctx variable is only being initialized when cpp runtime is chosen.
      The error:
      error: '__tvm_module_ctx' undeclared (first use in this function) 203 | if (TVMBackendGetFuncFromEnv(__tvm_module_ctx, "tvm.contrib.sort.argsort", &tvm_contrib_sort_argsort_packed) != 0) {
    • Sorting the scores of previous and current NMS steps. There are two alternatives here:
      • implement some sorting algorithm as part of hybrid script (to replace current bubble sort)
      • save the result of each NMS step and use argsort after the hybrid script part. This approach has a drawback as it requires significant amount of memory to store the results of each NMS step.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 16, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@ilyag-grovety
Copy link
Contributor Author

@leandron @Aleksei-grovety

@sergio-grovety
Copy link
Contributor

cc @ekalda

@ekalda
Copy link
Contributor

ekalda commented Jun 29, 2023

Thanks for the patch @ilyag-grovety, looks all very neat and good to me. Hopefully the CI will go green as well once the model URL will be properly enabled.

Regarding to testing, from my research, I didn't find a way to create that operator through current TF or Keras APIs either, so I think using the model consisting of this operator is the best approach here.

I think using simple sorting algorithm is fine for now. I don't know what it would take to enable argsort for cpp runtime, but any effort of that kind should be a separate patch anyway. If bubble sort becomes a massive performance bottleneck in the future, then it is possible to further enhance this operator with enabling argsort or implementing a different sorting algorithm, but for now it should be fine.

@Aleksei-grovety
Copy link
Contributor

Aleksei-grovety commented Jul 6, 2023

Hi @mehrdadh! Could you please upload the model to S3 (workflow parameters url: url,
sha256: f35589f2153a33615ac38b86fc7a2fd7fa9e3e3897c4930c16c161e6b2cd3a4e,
upload_path: upload_path)? It's necessary for testing NMS.

Copy link
Contributor

@Aleksei-grovety Aleksei-grovety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ilyag-grovety! Looks good just a couple comments.

with ib.if_scope(j == 0):
out_base_idx = i * num_anchors * 6
out_loc[out_base_idx] = cls_id[tid] - 1.0
out_loc[out_base_idx] = (
cls_id[tid] - 0.0 if keep_background == 1 else cls_id[tid] - 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to subtract zero here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'll remove, thanks

out_scores_shape[0] = boxes_shape[0]
out_scores_shape[1] = int64(attrs.max_detections)

out_num_detections_shape[0] = int64(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be out_num_detections_shape[0] = boxes_shape[0]? As further comments point to num_detections of size (batch_size,).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed

@sergio-grovety
Copy link
Contributor

@tvm-bot rerun

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work @ilyag-grovety!

@ekalda ekalda merged commit 619bb1d into apache:main Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants