Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm support #16

Closed
sonicrules1234 opened this issue Feb 7, 2023 · 25 comments
Closed

rocm support #16

sonicrules1234 opened this issue Feb 7, 2023 · 25 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed needs testing relevant feature is implemented but needs further testing/debugging os: linux p: med medium priority stale

Comments

@sonicrules1234
Copy link

I'd be willing to help with the testing for this

@decahedron1 decahedron1 self-assigned this Feb 7, 2023
@decahedron1 decahedron1 added the enhancement New feature or request label Feb 7, 2023
decahedron1 added a commit that referenced this issue Feb 7, 2023
@decahedron1
Copy link
Member

decahedron1 commented Feb 7, 2023

Just added ExecutionProvider::rocm() in b02744a. To use ROCm you'll have to build ONNX Runtime from source and point ort to the compiled libraries with ORT_STRATEGY=system.

@sonicrules1234
Copy link
Author

Building onnxruntime with rocm doesn't support making a shared library, so when I compile, it's giving me "/usr/bin/ld: cannot find -lonnxruntime: No such file or directory", even if I add to LD_LIBRARY_PATH.

@decahedron1
Copy link
Member

Is this an error when compiling ort or ONNX Runtime itself?

@sonicrules1234
Copy link
Author

ort

@decahedron1
Copy link
Member

Ah, when using ORT_STRATEGY=system with static libraries you need to set ORT_LIB_LOCATION to the library path, e.g. ORT_LIB_LOCATION=~/onnxruntime/build/Release/

@sonicrules1234
Copy link
Author

Hm, looks like ort is compiling fine, but using it with your diffusers library is when it has the linking error, since there is no libonnxruntime.a either.

@sonicrules1234
Copy link
Author

Here's a list of files and directories the rocm version of onnxruntime produced: https://gist.github.com/45585c5f7bd797bbe9c6e3998edf0b34

@decahedron1
Copy link
Member

decahedron1 commented Feb 8, 2023

Looks like I did not properly implement static linking with ORT_STRATEGY=system. 5044e45 should fix (most of) the linking issues.

@sonicrules1234
Copy link
Author

sonicrules1234 commented Feb 8, 2023

Now I'm getting
error: could not find native static library protobuf-lited, perhaps an -L flag is missing?

error: could not compile ort due to previous error

@decahedron1
Copy link
Member

sorry for the delay. Linking should be fixed with 2364c5d.

@sonicrules1234
Copy link
Author

I fixed the ort typo I mentioned last comment, which got ort compiling again, but I got https://gist.github.com/f0de93ba9fe3f0639a46d295b6f1e993 when compiling my program that uses your diffusers library.

@decahedron1
Copy link
Member

Ah, looks like the ONNX docs were wrong. Can you change OrtSessionOptionsAppendExecutionProvider_ROCm in src/execution_providers.rs on line 18 and line 221 to OrtSessionOptionsAppendExecutionProvider_ROCM (capital M) and try again?

@sonicrules1234
Copy link
Author

https://gist.github.com/d081fd8ecb2812aaa5fe1795129f183b

I think it gave the same error but with the capital M this time

@decahedron1
Copy link
Member

OrtSessionOptionsAppendExecutionProvider_ROCM is defined in onnxruntime/core/session/provider_bridge_ort.cc, which is built in libonnxruntime_session.a, which is linked by ort here, so I'm not sure why it can't find the symbol still. Maybe it's defined in another library that I can't see.

@sonicrules1234 what branch of microsoft/onnxruntime is checked out, and would it be possible to share the contents of the build directory?

@sonicrules1234
Copy link
Author

Sorry it took so long to respond, I didn't see any email notification for this. Right now the files are on a somewhat corrupted partition. Once my pc is working properly again, I'll give this a try once more.

@sonicrules1234
Copy link
Author

Okay, got it back to where it was on a different install. I'm using the main branch of onnxruntime, and pulled it out of the docker build.
https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm
The folder is 2.2G in size. If you have a place I can upload to, I can upload it.

@decahedron1
Copy link
Member

OrtSessionOptionsAppendExecutionProvider_ROCM has apparently been broken for a long time, so I pushed 8889616 using a newer API. Linking should finally be fixed, please let me know how testing goes 😁

@sonicrules1234
Copy link
Author

Ok, it compiles and runs now, but doesn't seem to be using the gpu:

https://gist.github.com/6e435a1caf2290c9eec768e2793a36eb

@decahedron1
Copy link
Member

Does it output anything when running with the environment variable RUST_LOG=ort=debug?

@sonicrules1234
Copy link
Author

Nope

@decahedron1
Copy link
Member

Add

[dependencies]
tracing = "0.1"
tracing-subscriber = "0.3"

to your Cargo.toml, and at the top of fn main(), add

fn main() {
    tracing_subscriber::fmt::init();

    ...

then run again to see logs.

@sonicrules1234
Copy link
Author

It's spamming

2023-03-05T22:29:09.229375Z DEBUG apply_execution_providers: ort::execution_providers: ROCm execution provider registration Err(Msg("/code/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1058 void onnxruntime::ProviderSharedLibrary::Ensure() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory\n"))
2023-03-05T22:29:09.353019Z DEBUG new{allocator=Device memory_type=Default}: ort::memory: Creating new OrtMemoryInfo.
2023-03-05T22:29:09.353052Z DEBUG drop{self=SessionBuilder { env: "default", allocator: Device, memory_type: Default }}: ort::session: Dropping the session options.

That shared library file does exist at the root of ORT_LIB_LOCATION

@decahedron1
Copy link
Member

Try running with the env var LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORT_LIB_LOCATION

@sonicrules1234
Copy link
Author

Doesn't seem to make a difference

@decahedron1 decahedron1 added help wanted Extra attention is needed p: med medium priority labels Jun 3, 2023
@Quozul
Copy link

Quozul commented Aug 1, 2023

Hello! I tried myself to run it, I have encountered the above error of

libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory

Which was fixed by adding it to LD_LIBRARY_PATH

FROM rocm/dev-ubuntu-22.04:5.6-complete

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=main

WORKDIR /code

ENV PATH /opt/miniconda/bin:/code/cmake-3.26.3-linux-x86_64/bin:${PATH}

RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh &&\
    cd onnxruntime &&\
    /bin/sh ./build.sh --allow_running_as_root --config Release --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm \
    # Modified from this point
    --skip_submodule_sync  --skip_tests --build_shared_lib &&\
    cd build/Linux/Release/ &&\
    make install

ENV ONNXRUNTIME_DIR="/code/onnxruntime/build/Linux/Release/"
ENV LD_LIBRARY_PATH=$ONNXRUNTIME_DIR:$LD_LIBRARY_PATH

# Next I copy and run a Rust application

But now I get these logs:

2023-08-01T10:56:25.642746Z  INFO apply_execution_providers: ort::execution_providers: Successfully registered `ROCmExecutionProvider`
Segmentation fault (core dumped)

It is running inside Docker. Perhaps this is because I'm using ROCm 5.6?
How can I debug this segmentation fault?

EDIT: Just wasted 30 minutes, it doesn't work with ROCm 5.4 either.

@decahedron1 decahedron1 added stale needs testing relevant feature is implemented but needs further testing/debugging labels Sep 20, 2023
@decahedron1 decahedron1 mentioned this issue Oct 27, 2023
Merged
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed needs testing relevant feature is implemented but needs further testing/debugging os: linux p: med medium priority stale
Projects
None yet
Development

No branches or pull requests

3 participants