how to enable MULTI_DEVICE_SAFE_MODE ? #121

efschu · 2024-12-27T08:41:17Z

In vs-trt, how do I enable multi-device safe mode?

torch_tensorrt.runtime.multi_device_safe_mode

It was enabled by default in earlier versions, but since 10.2(?) it's disabled by default. But it makes it impossible to use my Voltas with Turings together.

WolframRhodium · 2024-12-27T13:30:55Z

vs-trt does not rely on PyTorch at all so I don't know what you mean by enable it. You might be confusing it with HolyWu's plugins?

Speaking of multi-device inference, you should be able to do that in vs-trt using static scheduling.

efschu · 2024-12-27T14:32:59Z

When using multiple GPUs from different generation it "errors out" with
ICudaEngine::createExecutionContext: Error Code 1: Myelin ([version.cpp:operator():80] Compiled assuming that device 0 was SM 70, but device 0 is SM 75
SM70 is Volta (V100), SM75 is Turing (2080ti)
I use code like following:

stream0 = core.std.SelectEvery(core.trt.Model(clip, engine_path="/root/realesr-general-wdn-x4v3_opset16_2080ti.engine", num_streams=3, device_id=0), cycle=2, offsets=0)
stream1 = core.std.SelectEvery(core.trt.Model(clip, engine_path="/root/realesr-general-wdn-x4v3_opset16_V100.engine", num_streams=3, device_id=1), cycle=2, offsets=1)
clip = core.std.Interleave([stream0, stream1])

MULTI_DEVICE_SAFE_MODE is a variable in TensorRT, and, if true, it checks for CUDA compatibility every time it is called. Its default is false nowadays, but it has been ture in the past (or it didnt exist and it checked it every time), where it was no problem to do trt.Model with different SM architectures.

Have a look here:
https://github.com/pytorch/TensorRT/blob/main/core/runtime/runtime.cpp

bool MULTI_DEVICE_SAFE_MODE = false;

So my question is, where to set this for the vapoursynth ? I guess when building libvstrt.so - but how?

WolframRhodium · 2024-12-27T16:00:17Z

That's a very interesting error. Could you try to set the environment variable CUDA_VISIBLE_DEVICE to 1 before building the engine for V100 using trtexec (and in this case you should not be required to set the device id in trtexec's command line), and then try the current code again?

efschu · 2024-12-27T16:01:19Z

That's a very interesting error. Could you try to set the environment variable CUDA_VISIBLE_DEVICE to 1 before building the engine for V100 using trtexec, and then try the current code again?

I already tried that. Same error

Even when I change to V100 stream0 and 2080ti stream1 it says

Compiled assuming that device 0 was SM 75, but device 0 is SM 70

WolframRhodium · 2024-12-27T16:02:58Z

I believe the flag you mentioned is internal to pytorch-tensorrt only, but let me check it carefully.

EDIT: yep I believe it is only a pytorch-tensorrt warning that has nothing to do with the trt library itself.

Anyway, how did you produce the engines exactly? Streams have nothing to do with devices in this case.

efschu · 2024-12-27T16:10:37Z

trtexec --fp16 --onnx=./realesr-general-wdn-x4v3_opset16.onnx --minShapes=input:1x3x574x720 --optShapes=input:1x3x574x720 --maxShapes=input:1x3x574x720 --saveEngine=./realesr-general-wdn-x4v3_opset16_<card_name>.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=2

efschu · 2024-12-27T16:16:37Z

Anyway, how did you produce the engines exactly? Streams have nothing to do with devices in this case.

Yes that's true, but it shows me, that SM "level" is set to first device being called, and then errors out on next device with different SM "level"

WolframRhodium · 2024-12-28T06:38:32Z

Thanks.

efschu · 2024-12-29T13:13:32Z

For now I setup a new environment with Tensor RT 8.6.1 and CUDA 11.8 and it works "again"

I dunno exactly with which version they set MULTI_DEVICE_SAFE_MODE to false by default. So I went back to the versions I know they work.

WolframRhodium · 2024-12-29T14:27:23Z

That's probably because the Myelin optimiser is not that aggressive in older TensorRT. Also Volta is not supported in TensorRT 10.5 and later.

efschu changed the title ~~how to enable torch_tensorrt.runtime.multi_device_safe_mode ?~~ how to enable MULTI_DEVICE_SAFE_MODE ? Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to enable MULTI_DEVICE_SAFE_MODE ? #121

how to enable MULTI_DEVICE_SAFE_MODE ? #121

efschu commented Dec 27, 2024

WolframRhodium commented Dec 27, 2024

efschu commented Dec 27, 2024 •

edited

Loading

WolframRhodium commented Dec 27, 2024 •

edited

Loading

efschu commented Dec 27, 2024 •

edited

Loading

WolframRhodium commented Dec 27, 2024 •

edited

Loading

efschu commented Dec 27, 2024

efschu commented Dec 27, 2024

WolframRhodium commented Dec 28, 2024

efschu commented Dec 29, 2024

WolframRhodium commented Dec 29, 2024 •

edited

Loading

how to enable MULTI_DEVICE_SAFE_MODE ? #121

how to enable MULTI_DEVICE_SAFE_MODE ? #121

Comments

efschu commented Dec 27, 2024

WolframRhodium commented Dec 27, 2024

efschu commented Dec 27, 2024 • edited Loading

WolframRhodium commented Dec 27, 2024 • edited Loading

efschu commented Dec 27, 2024 • edited Loading

WolframRhodium commented Dec 27, 2024 • edited Loading

efschu commented Dec 27, 2024

efschu commented Dec 27, 2024

WolframRhodium commented Dec 28, 2024

efschu commented Dec 29, 2024

WolframRhodium commented Dec 29, 2024 • edited Loading

efschu commented Dec 27, 2024 •

edited

Loading

WolframRhodium commented Dec 27, 2024 •

edited

Loading

efschu commented Dec 27, 2024 •

edited

Loading

WolframRhodium commented Dec 27, 2024 •

edited

Loading

WolframRhodium commented Dec 29, 2024 •

edited

Loading