Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-9926] Fix rpath for linking to libdatadog when loading from extension dir #3683

Merged

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Jun 5, 2024

What does this PR do?

This PR is a follow-up to
#3582 .

In that PR, we fixed loading the profiling native extension so that it could be loaded from the Ruby extensions directory (see the original PR for more details).

It turns out this was not enough! Specifically, the customer reported that they saw the following error

Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling
native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux
due to libdatadog_profiling.so: cannot open shared object file: No such file or directory

Specifically, what this message tells is that we're finding the profiling native extension BUT it's failing to load BECAUSE the dynamic loader is not able to find its libdatadog_profiling.so dependency.

From debugging the issue with the customer, I suspect that what we're seeing here is a repeat of
#2067 / #2125 , that is, the paths where the profiler is compiled are changed at deployment, and so we also need to adjust the relative rpath to account for this.

I haven't yet confirmed with the customer that this is their issue, BUT I was able to reproduce the exact problem if I moved the installation of the library in the way I mention above (see "how to test the change", below).

Motivation:

Fix this weird corner case that made the profiler not load.

Additional Notes:

This is a really really weird corner case, so I'm happy to further describe what the issue is if my description above + the comments in the code are still too cryptic to understand.

I'm opening this to target the 1.x-stable branch, as I'm hoping the customer can test the fix. If it's successful, I'll also forward-port it to the 2.x branch.

How to test the change?

I've added test code for the helper, but actually validating the whole rpath thing is a bit annoying.

Here's how I triggered the issue myself, and then used it to validate the fix:

 # Build fixed gem into folder, will be used later
$ bundle exec rake build
datadog 2.0.0.rc1 built to pkg/datadog-2.0.0.rc1.gem.

 # Open a clean Ruby docker installation
$ docker run --network=host -ti -v `pwd`:/working ruby:3.2.2-bookworm /bin/bash

 # I've created a minimal test gemfile ahead of time
/working/rpathtest# cat gems.rb
source 'https://rubygems.org'

gem 'datadog'
 # Tell bundler to install the gem into a folder
/working/rpathtest# bundle config set --local path 'vendor/bundle'
/working/rpathtest# bundle install

 # Confirm profiler works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Now let's simulate the native extension being loaded from the
 # extensions directory:
/working/rpathtest# find | grep \.so$ | grep datadog
./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_loader.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so
/working/rpathtest# rm ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so  ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so

 # Confirm profiler still works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Now let's simulate the folders being moved (the issue being fixed):
/working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor/bundle"
 # Update this to vendor2...
working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor2/bundle"
 # and move the folder
/working/rpathtest# mv vendor/ vendor2

 # Now we've triggered the exact same error message as reported by the
 # customer
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
W, [2024-06-05T15:51:12.488843 #517]  WARN -- datadog: [datadog] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/working/rpathtest/vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog/profiling/load_native_extension.rb:41:in `<top (required)>''

 # Now let's test the fix. Let's start by recreating the issue:
 # Put the fixed version into the bundler cache...
/working/rpathtest# cp /working/pkg/datadog-2.0.0.rc1.gem vendor2/bundle/ruby/3.2.0/cache/datadog-2.0.0.rc1.gem
 # force bundler to reinstall...
working/rpathtest# rm -rf vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/
working/rpathtest# bundle install
 # Force gem to be loaded from extension directory
/working/rpathtest# rm ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so  ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so
 # Confirm it works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Let's now change the vendor folder again:
/working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor3/bundle"
/working/rpathtest# mv vendor2/ vendor3

 # And it now doesn't fail:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # And extra confirmation that the relative paths are working:
/working/rpathtest# ldd ./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
	libdatadog_profiling.so => /working/rpathtest/./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/../../../../gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so (0x00007ff127c00000)

…ension dir

**What does this PR do?**

This PR is a follow-up to
#3582 .

In that PR, we fixed loading the profiling native extension so that
it could be loaded from the Ruby extensions directory (see the original
PR for more details).

It turns out this was not enough! Specifically, the customer reported
that they saw the following error

> Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling
> native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux
> due to libdatadog_profiling.so: cannot open shared object file: No such file or directory

Specifically, what this message tells is that we're finding the
profiling native extension BUT it's failing to load BECAUSE the dynamic
loader is not able to find its `libdatadog_profiling.so` dependency.

From debugging the issue with the customer, I suspect that what
we're seeing here is a repeat of
#2067 /
#2125 , that is, the
paths where the profiler is compiled are changed at deployment, and
so we also need to adjust the relative rpath to account for this.

I haven't yet confirmed with the customer that this is their issue,
BUT I was able to reproduce the exact problem if I moved the
installation of the library in the way I mention above (see "how to test
the change", below).

**Motivation:**

Fix this weird corner case that made the profiler not load.

**Additional Notes:**

This is a really really weird corner case, so I'm happy to further
describe what the issue is if my description above + the comments in the
code are still too cryptic to understand.

**How to test the change?**

I've added test code for the helper, but actually validating the whole
rpath thing is a bit annoying.

Here's how I triggered the issue myself, and then used it to validate
the fix:

```
 # Build fixed gem into folder, will be used later
$ bundle exec rake build
datadog 2.0.0.rc1 built to pkg/datadog-2.0.0.rc1.gem.

 # Open a clean Ruby docker installation
$ docker run --network=host -ti -v `pwd`:/working ruby:3.2.2-bookworm /bin/bash

 # I've created a minimal test gemfile ahead of time
/working/rpathtest# cat gems.rb
source 'https://rubygems.org'

gem 'datadog'
 # Tell bundler to install the gem into a folder
/working/rpathtest# bundle config set --local path 'vendor/bundle'
/working/rpathtest# bundle install

 # Confirm profiler works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Now let's simulate the native extension being loaded from the
 # extensions directory:
/working/rpathtest# find | grep \.so$ | grep datadog
./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_loader.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
./vendor/bundle/ruby/3.2.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so
/working/rpathtest# rm ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so  ./vendor/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so

 # Confirm profiler still works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Now let's simulate the folders being moved (the issue being fixed):
/working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor/bundle"
 # Update this to vendor2...
working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor2/bundle"
 # and move the folder
/working/rpathtest# mv vendor/ vendor2

 # Now we've triggered the exact same error message as reported by the
 # customer
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
W, [2024-06-05T15:51:12.488843 #517]  WARN -- datadog: [datadog] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.2.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/working/rpathtest/vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog/profiling/load_native_extension.rb:41:in `<top (required)>''

 # Now let's test the fix. Let's start by recreating the issue:
 # Put the fixed version into the bundler cache...
/working/rpathtest# cp /working/pkg/datadog-2.0.0.rc1.gem vendor2/bundle/ruby/3.2.0/cache/datadog-2.0.0.rc1.gem
 # force bundler to reinstall...
working/rpathtest# rm -rf vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/
working/rpathtest# bundle install
 # Force gem to be loaded from extension directory
/working/rpathtest# rm ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_native_extension.3.2.2_x86_64-linux.so  ./vendor2/bundle/ruby/3.2.0/gems/datadog-2.0.0.rc1/lib/datadog_profiling_loader.3.2.2_x86_64-linux.so
 # Confirm it works:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # Let's now change the vendor folder again:
/working/rpathtest# cat /usr/local/bundle/config
---
BUNDLE_PATH: "vendor3/bundle"
/working/rpathtest# mv vendor2/ vendor3

 # And it now doesn't fail:
/working/rpathtest# DD_PROFILING_ENABLED=true bundle exec ddprofrb exec ruby -e "sleep 1"
 # ... No errors loading profiler ...

 # And extra confirmation that the relative paths are working:
/working/rpathtest# ldd ./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/datadog_profiling_native_extension.3.2.2_x86_64-linux.so
	libdatadog_profiling.so => /working/rpathtest/./vendor3/bundle/ruby/3.2.0/extensions/x86_64-linux/3.2.0/datadog-2.0.0.rc1/../../../../gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so (0x00007ff127c00000)
```
@ivoanjo ivoanjo requested a review from a team as a code owner June 5, 2024 16:03
@github-actions github-actions bot added the profiling Involves Datadog profiling label Jun 5, 2024
@ivoanjo ivoanjo requested a review from r1viollet June 5, 2024 16:03
@p-datadog
Copy link
Member

p-datadog commented Jun 6, 2024

I am trying to understand the situation, here are where the various .so's are on my machine:

big% find ~ -name '*datadog*so'
/home/w/.cache/vendor/bundle/ruby/2.7.0/extensions/x86_64-linux/2.7.0/datadog-2.0.0.beta2/datadog_profiling_loader.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/extensions/x86_64-linux/2.7.0/datadog-2.0.0.beta2/datadog_profiling_native_extension.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/datadog-2.0.0.beta2/ext/datadog_profiling_loader/datadog_profiling_loader.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/datadog-2.0.0.beta2/ext/datadog_profiling_native_extension/datadog_profiling_native_extension.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/datadog-2.0.0.beta2/lib/datadog_profiling_loader.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/datadog-2.0.0.beta2/lib/datadog_profiling_native_extension.2.7.8_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/2.7.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux/libdatadog-aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux-musl/libdatadog-aarch64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.0.0/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/extensions/x86_64-linux/3.1.0/datadog-2.0.0.beta2/datadog_profiling_loader.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/extensions/x86_64-linux/3.1.0/datadog-2.0.0.beta2/datadog_profiling_native_extension.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/datadog-2.0.0.beta2/ext/datadog_profiling_loader/datadog_profiling_loader.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/datadog-2.0.0.beta2/ext/datadog_profiling_native_extension/datadog_profiling_native_extension.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/datadog-2.0.0.beta2/lib/datadog_profiling_loader.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/datadog-2.0.0.beta2/lib/datadog_profiling_native_extension.3.1.4_x86_64-linux.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.1.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-6.0.0.1.0-x86_64-linux/vendor/libdatadog-6.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-6.0.0.1.0-x86_64-linux/vendor/libdatadog-6.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-7.0.0.1.0-x86_64-linux/vendor/libdatadog-7.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle/ruby/3.3.0/gems/libdatadog-9.0.0.1.0-x86_64-linux/vendor/libdatadog-9.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/gems/libdatadog-5.0.0.1.0-x86_64-linux/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux/libdatadog-aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux-musl/libdatadog-aarch64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.0.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.3.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux/libdatadog-aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.3.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/aarch64-linux-musl/libdatadog-aarch64-alpine-linux-musl/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.3.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so
/home/w/.cache/vendor/bundle-2.1/ruby/3.3.0/gems/libdatadog-5.0.0.1.0/vendor/libdatadog-5.0.0/x86_64-linux-musl/libdatadog-x86_64-alpine-linux-musl/lib/libdatadog_profiling.so

(I have a possibly slightly different configuration where I have a designated global directory for gems installed by bundler.)

I think what happens with extensions in gems is the following:

  1. rubygems installs the source code of the gem (ruby + C)
  2. rubygems then compiles the C extension, if any, in place wherever the C source was installed
  3. rubygems then potentially copies the built .so into a "permanent" location for it
  4. rubygems performs no cleanup of either the C extension code (that wouldn't be ever used at runtime), the temporary .obj files created during compilation, or the first copy of .so artifacts if they are copied in step 3

So, looking at the set of files present in an "installed" gem, it's actually a mix of files that are used at runtime and the temporary files used during build process that are never cleaned up.

Is it possible that the customer issue is actually due to some tool copying the "temporary" files, including the .so built, and not the "final" files? It probably wouldn't change the resulting logic that we would need but would at least provide an explanation for what is happening. A lot of ruby libraries just examine the filesystem around themselves assuming various files to be present that aren't used by ruby runtime and I wouldn't be surprised if there are multiple tools out in the wild that end up copying or using temporary files thinking those are permanently installed artifacts.

Copy link

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Is there a way to test this further ?

@ivoanjo
Copy link
Member Author

ivoanjo commented Jun 7, 2024

cc @p-datadog

I think what happens with extensions in gems is the following:

I think your understanding is almost correct. In particular this part

rubygems performs no cleanup of either the C extension code (that wouldn't be ever used at runtime), the temporary .obj files created during compilation, or the first copy of .so artifacts if they are copied in step 3

I believe this was recently changed on rubygems, now there's some cleanup: rubygems/rubygems#3958

Is it possible that the customer issue is actually due to some tool copying the "temporary" files, including the .so built, and not the "final" files? It probably wouldn't change the resulting logic that we would need but would at least provide an explanation for what is happening. A lot of ruby libraries just examine the filesystem around themselves assuming various files to be present that aren't used by ruby runtime and I wouldn't be surprised if there are multiple tools out in the wild that end up copying or using temporary files thinking those are permanently installed artifacts.

We have a meeting scheduled with the customer, perhaps we'll get some hints on how exactly they deploy their gems and what's causing their weird setup.

I don't think their tool is copying any temporary files -- it's correctly copying the stuff that ends up under /extensions/. It's more that the files that normally get installed into datadog-version/lib/ don't end up there.

Overall I've tried to support this weird setup because with Ruby being so configurable, I suspect it may not be the last time we see it, and so a bit of complexity on our side may save us from future support tickets.


Is there a way to test this further ?

@r1viollet It's a good question. Perhaps I was a bit too quick to dismiss having automated testing for this. On closer thought, we could have a CI step that basically does what I showed above: install the gem, then move the files around, then see if the profiler can still start. I'll see if I can take a stab at it before merging this PR.

@ivoanjo ivoanjo changed the title [NO-TICKET] Fix rpath for linking to libdatadog when loading from extension dir [PROF-9926] Fix rpath for linking to libdatadog when loading from extension dir Jun 12, 2024
@ivoanjo
Copy link
Member Author

ivoanjo commented Jun 12, 2024

Thanks y'all for the reviews and the feedback. I'm working on adding testing for this, but rather than pile on this PR, I'll open a separate small PR just with the testing.

Going ahead and merging this one! :)

@ivoanjo ivoanjo merged commit 1e58066 into 1.x-stable Jun 12, 2024
195 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/extend-relative-rpath-extensions-folder-1x-stable branch June 12, 2024 08:22
@github-actions github-actions bot added this to the 1.23.2 milestone Jun 12, 2024
ivoanjo added a commit to DataDog/prof-correctness that referenced this pull request Jun 12, 2024
…d relative rpath is needed

**What does this PR do?**

This PR adds a new test case that validates that
DataDog/dd-trace-rb#3582 and
DataDog/dd-trace-rb#3683 keep working fine.

**Motivation:**

As described in DataDog/dd-trace-rb#3683, this
a somewhat annoying thing to test, but important to avoid regressing.

**Additional Notes:**

You can actually see the evolution of both of those fixes in
this test.

E.g. here's dd-trace-rb 1.21.0 (prior to
DataDog/dd-trace-rb#3582 ):

```
W, [2024-06-12T09:34:08.759519 #7]  WARN -- ddtrace: [ddtrace] (/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/core/configuration/components.rb:115:in `startup!') Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.3.2_x86_64-linux due to /app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/profiling/../../datadog_profiling_native_extension.3.3.2_x86_64-linux.so: cannot open shared object file: No such file or directory' at '/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.21.1/lib/datadog/profiling/load_native_extension.rb:26:in `<top (required)>''
    --- FAIL: TestScenarios/scenarios/ruby_extension_dir_and_rpath (14.86s)
```

in this version, we failed because we couldn't load the native
extension.

Then here's dd-trace-rb 1.23.1 (without
DataDog/dd-trace-rb#3683 ) and if we
don't move the `vendor` folder (but still delete the so from the
lib folder):

```
    --- PASS: TestScenarios/scenarios/ruby_extension_dir_and_rpath (18.96s)
```

...but if we additionally move the vendor folder (aka what this PR
does in the Dockerfile):

```
W, [2024-06-12T09:37:33.517188 #6]  WARN -- ddtrace: [ddtrace] (/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.23.1/lib/datadog/core/configuration/components.rb:116:in `startup!') Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load datadog_profiling_native_extension.3.3.2_x86_64-linux due to libdatadog_profiling.so: cannot open shared object file: No such file or directory' at '/app/vendor-moved/bundle/ruby/3.3.0/gems/ddtrace-1.23.1/lib/datadog/profiling/load_native_extension.rb:39:in `<top (required)>''
    --- FAIL: TestScenarios/scenarios/ruby_extension_dir_and_rpath (3.25s)
```

Notice it fails BUT the error is now different from the one above --
the error is relating to loading `libdatadog_profiling.so`, not
`datadog_profiling_native_extension.3.3.2_x86_64-linux.so`.

And with the change in DataDog/dd-trace-rb#3683
(which will be in 1.23.2):

```
    --- PASS: TestScenarios/scenarios/ruby_extension_dir_and_rpath (9.60s)
```

**NOTE**: For this test, unlike other Ruby tests we have, we're pulling
in the latest **released** gem version (e.g. with `gem 'datadog'` on the
`gems.rb` file), not the latest from git (as we do for other Ruby
tests).

This is because gems get installed in different paths when bundler
downloads them directly from git, and we want to validate the path when
a stable version is installed.

This also means that this PR will show up as failed until the latest
datadog release (which will be 2.2.0) gets released. (Or 1.23.2, but
I left the test setup to test the latest 2.x releases, not the 1.x ones,
although I used 1.x on my tests above to show the evolution of the
issue).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants