-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP16 support to batchedNMSPlugin #1002
Conversation
Thanks @pbridger. Can you please sign your commits with |
@rajeevsrao Does this need to be integrated into master as well? |
Yes, but we will cherry-pick it internally first, test and release to master via 21.0x and then merge this change. |
Signed-off-by: Tyler Zhu <[email protected]> Signed-off-by: Paul Bridger <[email protected]>
Signed-off-by: Paul Bridger <[email protected]>
Signed-off-by: Paul Bridger <[email protected]>
Signed-off-by: Paul Bridger <[email protected]>
Signed-off-by: Rajeev Rao <[email protected]> Signed-off-by: Paul Bridger <[email protected]>
Thanks @pbridger. Will merge this commit once the corresponding cherry-pick for master is posted alongwith the 21.02 container update. |
Cool! Many thanks @rajeevsrao and @pranavm-nvidia for all your work to include this. |
Add support for float16 boxes and scores (but not mixed precision between boxes and scores).
Gives inference results within expected numerical tolerances using SSD300 and COCO2017 validation set. (As detailed here https://paulbridger.com/posts/tensorrt-object-detection-quantized/).
Despite the "#if CUDA_ARCH >= 800" this has not been tested on Ampere, only Turing arch.