-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT 10.3 wrong results! #4330
Comments
There was a known accuracy bug that was fixed in 10.5. Can you update your trt version. |
I also face a memory management problem that the results from both onnx and trt models are the same when I do htod_async. but the result is different when I do the dtod_async. Here is the code
I found that passing a numpy data / torch data will get two different results. I am using TensorRT 10.4 Anyone face the same problem? |
cuda.memcpy_dtod_async(self.inputs[i].device, model_input.data_ptr(), model_input.element_size() * model_input.nelement(), self.stream) to cuda.memcpy_dtod_async(inp.device, model_input.data_ptr(), model_input.element_size() * model_input.nelement(), self.stream) |
Is there any difference? It is because in the original code, the i index is enumerate from the model_inputs. More details on the code are shown below.
So, I don't find any difference in changing self.input[i].device to inp.device |
First, make sure the data of Then, try to use sync api |
The problem is resolved by flattening the tensor before getting the data_ptr.
to
I suspect it is because the memory address in torch tensor. |
Thank you for replying. I upgraded TensorRT on my Jetson device to version 10.7, but unfortunately, I still got the same results. It seems to be an issue related to a dynamic output allocation bug. I tried padding the output of the first model so that it would always serve as a static input for the second model, and this approach returned the correct output. |
@OctaAIVision trying to understand the full story here
|
Description
I’m in the process of migrating from TensorRT 8.6 to 10.3. Following the migration guide provided in the documentation, I was able to get inference working on 10.3. However, I’m seeing a significant drop in performance compared to 8.6(getting wrong answers not lower inference time), particularly when dealing with changes on dynamic input shapes.
I am currently working on a two-stage module where the output of the first network serves as the input to the second network. The two networks are connected in sequence.
Has anyone encountered similar issues or could provide guidance on how to handle memory management in TensorRT 10.3 when using dynamic input shapes?
Any help would be greatly appreciated!
Environment
TensorRT Version:10.3.0.30
NVIDIA GPU:NVIDIA Jetson Orin NX (16GB ram),aarch64
NVIDIA Driver Version:Jetpack 6.1
CUDA Version:12.6.68
CUDNN Version: 9.3.0.75
Operating System:Ubunto 22.04
Python Version (if applicable):3.10.12
Relevant Files
common_trt_10.py:
common_trt_8.py:
first model inference:
second model inference:
The text was updated successfully, but these errors were encountered: