Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to enable MULTI_DEVICE_SAFE_MODE ? #121

Open
efschu opened this issue Dec 27, 2024 · 10 comments
Open

how to enable MULTI_DEVICE_SAFE_MODE ? #121

efschu opened this issue Dec 27, 2024 · 10 comments

Comments

@efschu
Copy link

efschu commented Dec 27, 2024

In vs-trt, how do I enable multi-device safe mode?

torch_tensorrt.runtime.multi_device_safe_mode

It was enabled by default in earlier versions, but since 10.2(?) it's disabled by default. But it makes it impossible to use my Voltas with Turings together.

@WolframRhodium
Copy link
Contributor

vs-trt does not rely on PyTorch at all so I don't know what you mean by enable it. You might be confusing it with HolyWu's plugins?

Speaking of multi-device inference, you should be able to do that in vs-trt using static scheduling.

@efschu
Copy link
Author

efschu commented Dec 27, 2024

When using multiple GPUs from different generation it "errors out" with
ICudaEngine::createExecutionContext: Error Code 1: Myelin ([version.cpp:operator():80] Compiled assuming that device 0 was SM 70, but device 0 is SM 75
SM70 is Volta (V100), SM75 is Turing (2080ti)
I use code like following:

stream0 = core.std.SelectEvery(core.trt.Model(clip, engine_path="/root/realesr-general-wdn-x4v3_opset16_2080ti.engine", num_streams=3, device_id=0), cycle=2, offsets=0)
stream1 = core.std.SelectEvery(core.trt.Model(clip, engine_path="/root/realesr-general-wdn-x4v3_opset16_V100.engine", num_streams=3, device_id=1), cycle=2, offsets=1)
clip = core.std.Interleave([stream0, stream1])

MULTI_DEVICE_SAFE_MODE is a variable in TensorRT, and, if true, it checks for CUDA compatibility every time it is called. Its default is false nowadays, but it has been ture in the past (or it didnt exist and it checked it every time), where it was no problem to do trt.Model with different SM architectures.

Have a look here:
https://github.com/pytorch/TensorRT/blob/main/core/runtime/runtime.cpp

bool MULTI_DEVICE_SAFE_MODE = false;

So my question is, where to set this for the vapoursynth ? I guess when building libvstrt.so - but how?

@efschu efschu changed the title how to enable torch_tensorrt.runtime.multi_device_safe_mode ? how to enable MULTI_DEVICE_SAFE_MODE ? Dec 27, 2024
@WolframRhodium
Copy link
Contributor

WolframRhodium commented Dec 27, 2024

That's a very interesting error. Could you try to set the environment variable CUDA_VISIBLE_DEVICE to 1 before building the engine for V100 using trtexec (and in this case you should not be required to set the device id in trtexec's command line), and then try the current code again?

@efschu
Copy link
Author

efschu commented Dec 27, 2024

That's a very interesting error. Could you try to set the environment variable CUDA_VISIBLE_DEVICE to 1 before building the engine for V100 using trtexec, and then try the current code again?

I already tried that. Same error

Even when I change to V100 stream0 and 2080ti stream1 it says

Compiled assuming that device 0 was SM 75, but device 0 is SM 70

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Dec 27, 2024

I believe the flag you mentioned is internal to pytorch-tensorrt only, but let me check it carefully.

EDIT: yep I believe it is only a pytorch-tensorrt warning that has nothing to do with the trt library itself.

Anyway, how did you produce the engines exactly? Streams have nothing to do with devices in this case.

@efschu
Copy link
Author

efschu commented Dec 27, 2024

trtexec --fp16 --onnx=./realesr-general-wdn-x4v3_opset16.onnx --minShapes=input:1x3x574x720 --optShapes=input:1x3x574x720 --maxShapes=input:1x3x574x720 --saveEngine=./realesr-general-wdn-x4v3_opset16_<card_name>.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=2

@efschu
Copy link
Author

efschu commented Dec 27, 2024

Anyway, how did you produce the engines exactly? Streams have nothing to do with devices in this case.

Yes that's true, but it shows me, that SM "level" is set to first device being called, and then errors out on next device with different SM "level"

@WolframRhodium
Copy link
Contributor

Thanks.

@efschu
Copy link
Author

efschu commented Dec 29, 2024

For now I setup a new environment with Tensor RT 8.6.1 and CUDA 11.8 and it works "again"

I dunno exactly with which version they set MULTI_DEVICE_SAFE_MODE to false by default. So I went back to the versions I know they work.

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Dec 29, 2024

That's probably because the Myelin optimiser is not that aggressive in older TensorRT. Also Volta is not supported in TensorRT 10.5 and later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants