-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FRONTEND][TFLITE] Add support for TFLite's regular NMS operator #15117
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
cc @ekalda |
Thanks for the patch @ilyag-grovety, looks all very neat and good to me. Hopefully the CI will go green as well once the model URL will be properly enabled. Regarding to testing, from my research, I didn't find a way to create that operator through current TF or Keras APIs either, so I think using the model consisting of this operator is the best approach here. I think using simple sorting algorithm is fine for now. I don't know what it would take to enable |
Hi @mehrdadh! Could you please upload the model to S3 (workflow parameters url: url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ilyag-grovety! Looks good just a couple comments.
python/tvm/topi/cuda/ssd/multibox.py
Outdated
with ib.if_scope(j == 0): | ||
out_base_idx = i * num_anchors * 6 | ||
out_loc[out_base_idx] = cls_id[tid] - 1.0 | ||
out_loc[out_base_idx] = ( | ||
cls_id[tid] - 0.0 if keep_background == 1 else cls_id[tid] - 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to subtract zero here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'll remove, thanks
out_scores_shape[0] = boxes_shape[0] | ||
out_scores_shape[1] = int64(attrs.max_detections) | ||
|
||
out_num_detections_shape[0] = int64(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be out_num_detections_shape[0] = boxes_shape[0]
? As further comments point to num_detections
of size (batch_size,)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed
2f8ae58
to
730b4f7
Compare
@tvm-bot rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the work @ilyag-grovety!
This PR adds support of regular NMS operator.
Open questions:
Other NMS implementations, e.g., fast NMS, use a TF frozen graph from TF official website to convert a model to TFLite and keep NMS operations only. In order to create a similar test, we need to find an archive on TF official website that contains a frozen graph of a model compiled with
--use-regular-nms=True
flag. We haven't found it yet, so any help is appreciated.nms_score_threshold
. This PR implements this with a simple bubble sort in order to prove the algorithm's semantics. We tried to replace it withtvm.contrib.sort.argsort
. It works well when testing the regular NMS withrun_tvm_graph
as fast NMS test does or building and running the regular NMS with llvm target. At the same time, it fails to build (error is provided below) whentarget=ethos-u,cmsis-nn,c
. It seems that__tvm_module_ctx
variable is only being initialized whencpp
runtime is chosen.The error:
error: '__tvm_module_ctx' undeclared (first use in this function) 203 | if (TVMBackendGetFuncFromEnv(__tvm_module_ctx, "tvm.contrib.sort.argsort", &tvm_contrib_sort_argsort_packed) != 0) {
argsort
after the hybrid script part. This approach has a drawback as it requires significant amount of memory to store the results of each NMS step.