-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use FP16 ot INT8? #32
Comments
I don't think it's fully supported right now. What's the error message? |
Here is my output when I ran:
The output model is the roughly the same as the Float32 one. No speed gain. Model size stays the same. After further investigation, I noticed that Line 298 in f6f4138
I am pretty new to TensorRT so this might be a stupid question. But does this depend on my GPU? I am currently using Titan X (Pascal). Thanks a lot @yinghai |
I tried to use Titan V, which supports FP16 I believe. Notice that FP16 available changes from 0 to 1. However, it gave me an error when I ran
FYI, |
Is it possible to share a minimally reproducible model with us? It doesn't necessarily need to be the real model and weights can be randomized, as long as it can help us reproduce the issue. |
@yinghai Thanks a lot for helping out. I will try to walk you through what I have done: The original semantic segmentation pytorch model is here First of all, I converted the Pytorch model to ONNX model using the code below because I didn't want to create the model definition in the TensorRT format from scratch.
The converted ONNX model is here. Then I ran The segmentation result looks correct, which is why I believe the entire conversion process is correct (pytorch -> ONNX -> TensorRT engine trt file). However, the running speed is slower than before at ~0.33s / image on Titan X. It ran even slower at ~0.42s / image on Titan V. I didn't quite get why TensorRT will slow things down. Do you have some ideas? Therefore, I wanted to try FP16 to further speed up the model. On Titan X, the conversion Line 298 in f6f4138
On Titan V, the conversion
Sorry for the long message. I hope this clarifies my situation. Thanks again for your generous help. |
This sounds strange. How did you run the TRT engine? |
Here is my code modified from TensorRT example code. Sorry it's a bit long. Please let me know if you see any problem with my code.
Specifically I measure the speed by doing:
Is it correct? You can run the code by doing:
|
Time measure doesn't seem to be correct as |
@yinghai Thanks for your info. Yes the image size is always the same. What do you think is the best way to measure inference time? Any suggestion? Also, any idea on why FP16 conversion fails on Titan V? Thanks! |
You should measure it after |
Thanks! Unfortunately the result is the same. The inference time for FP32 TensorRT engine is still ~0.42s / image on Titan V whereas the original pytorch model runs at ~0.17s / image. I am still unable to convert the model to FP16. One possibility I can think of is that UPDATE: I commented out |
@ChengshuLi I've got stuck in the same problem, while I used MXNet models. Here's my code to measure TensorRT's inference time.
And it seems that TensorRT even retarded the inference. |
Looks like we can try something like this |
I'm trying to use your method. And i got error like /onnx/onnx/onnx_onnx2trt_onnx .pb.h:12:2: error: #error This file was generated by a newer version of protoc w hich is #error This file was generated by a newer version of protoc which is |
I Solved the Installation Problem .. Now I followed your code & I got error like segmentation fault (core dumped) on tensorrt 5.1.2 can you please help me with that.. |
any updates? |
generally segmentation fault (core dumped) relates to memory allocation problem. I provide full access to my path and file and memory's are free. Even though, it replicate the same error. please update if any one solved or ideas relate to this problem. |
Seems that Many thanks. |
FP16 inference is 10x slower than FP32! |
Coming late to this thread, so I'll try my best to answer the questions posed by multiple users:
Closing this thread, if anyone needs an update on their specific issue feel free to open a new issue. |
Please refer to our open source quantization tool ppq, we can help you solve quantization problems |
Hi,
I was trying to use FP16 and INT8.
I understand this is how you prepare a FP32 model.
I tried this, but it didn't work.
Any help will be greatly appreciated. Thanks!
@yinghai
The text was updated successfully, but these errors were encountered: