Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last two versions of R (4.0.5 and 4.1.0) won't run under Docker #2361

Open
scotstan opened this issue May 22, 2021 · 24 comments
Open

Last two versions of R (4.0.5 and 4.1.0) won't run under Docker #2361

scotstan opened this issue May 22, 2021 · 24 comments
Labels

Comments

@scotstan
Copy link

I installed the last 5 clearlinux/r-base versions to bisect when the R command stopped working. Seems as if something changed between 4.0.2 and 4.0.5 that prevents R from running.

The last two versions (4.0.5 and 4.1.0) respond with ERROR: R_HOME ('/usr/lib64/R') not found

image

My host is Ubuntu 20.04.2.

Here's the short script to reproduce:

#!/usr/bin/env bash

# Clear Linux: R version tags from:
# https://hub.docker.com/r/clearlinux/r-base/tags?page=1&ordering=-name

define(){ IFS='\n' read -r -d '' ${1} || true; }

define CMD <<'EOF'
    Rscript --version
    Rscript -e "version[['nickname']]"
EOF

for ver in 3.6.3 4.0.0 4.0.2 4.0.5 4.1.0; do
    docker run --rm -it clearlinux/r-base:$ver bash -i -c "$CMD"
done

R 4.1.0 works fine with rocker/r-base

image

@lebensterben
Copy link

It works fine on native Clear Linux.

@scotstan scotstan changed the title Last two versions of R (4.0.5 and 4.1.0) won't run Last two versions of R (4.0.5 and 4.1.0) won't run under Docker May 22, 2021
@phmccarty phmccarty self-assigned this Jun 2, 2021
@phmccarty phmccarty removed the new label Jun 2, 2021
@phmccarty
Copy link
Contributor

Thanks for the detailed report. I will investigate.

@scotstan
Copy link
Author

scotstan commented Jun 5, 2021

I can test any time. Thanks @phmccarty

@phmccarty
Copy link
Contributor

@scotstan I just ran your test script, and I cannot reproduce the issue. Maybe the snapshots of clearlinux/r-base:4.0.5 and clearlinux/r-base:4.1.0 that you tested were broken... That's possible, but I'm not yet sure what the underlying issue might have been. Can you test again after pulling the latest builds of those images?

Here's the output I see from the script on first run:

$ ./test.sh
Unable to find image 'clearlinux/r-base:3.6.3' locally
3.6.3: Pulling from clearlinux/r-base
3f3a3a3cd1fe: Pull complete
3b4719fcbf77: Pull complete
Digest: sha256:8f6fa391e33c6eafd88557be6402c07abe1a2114c22030f2fd77669c2a508f4f
Status: Downloaded newer image for clearlinux/r-base:3.6.3
R scripting front-end version 3.6.3 (2020-02-29)
[1] "Holding the Windsock"
Unable to find image 'clearlinux/r-base:4.0.0' locally
4.0.0: Pulling from clearlinux/r-base
a850526e45ae: Pull complete
ea3403d97db9: Pull complete
Digest: sha256:79d48f4a6efb29b1107276cb5650fe255d33f09ecb0a066f946d078c4d82683f
Status: Downloaded newer image for clearlinux/r-base:4.0.0
R scripting front-end version 4.0.0 (2020-04-24)
[1] "Arbor Day"
Unable to find image 'clearlinux/r-base:4.0.2' locally
4.0.2: Pulling from clearlinux/r-base
511d020684c4: Pull complete
92aa84514374: Pull complete
Digest: sha256:59439af4940e9f7b2b358ad54d494dd8e84c774005058a088dd550628cb1af82
Status: Downloaded newer image for clearlinux/r-base:4.0.2
R scripting front-end version 4.0.2 (2020-06-22)
[1] "Taking Off Again"
Unable to find image 'clearlinux/r-base:4.0.5' locally
4.0.5: Pulling from clearlinux/r-base
6d8a9c1757d0: Pull complete
3c576b2fb216: Pull complete
Digest: sha256:01a4ddb96e98e977cd02b11e12979d033cbdc1633d16eaf8d3ed6e838e21586f
Status: Downloaded newer image for clearlinux/r-base:4.0.5
R scripting front-end version 4.0.5 (2021-03-31)
[1] "Shake and Throw"
Unable to find image 'clearlinux/r-base:4.1.0' locally
4.1.0: Pulling from clearlinux/r-base
fa6f55ca1b2a: Pull complete
268dc30d08c7: Pull complete
Digest: sha256:33367ffe624b73cd1076605da71e0319fd2e366eec4f60f3701516ae84801eba
Status: Downloaded newer image for clearlinux/r-base:4.1.0
R scripting front-end version 4.1.0 (2021-05-18)
[1] "Camp Pontanezen"

And the second run:

 $ ./test.sh
R scripting front-end version 3.6.3 (2020-02-29)
[1] "Holding the Windsock"
R scripting front-end version 4.0.0 (2020-04-24)
[1] "Arbor Day"
R scripting front-end version 4.0.2 (2020-06-22)
[1] "Taking Off Again"
R scripting front-end version 4.0.5 (2021-03-31)
[1] "Shake and Throw"
R scripting front-end version 4.1.0 (2021-05-18)
[1] "Camp Pontanezen"

@phmccarty
Copy link
Contributor

I re-tested this several times in the past few months, and I still could not reproduce. If it's still a problem on your end, please reopen.

@scotstan
Copy link
Author

scotstan commented Jan 6, 2022

Thanks. I'll try to retest when I get a chance.

@phmccarty
Copy link
Contributor

phmccarty commented Jan 6, 2022

Great, thanks. And just to clarify, I only tested running the script on a Clear Linux host. But the underlying host OS should not matter in this case, I would think.

@scotstan
Copy link
Author

scotstan commented Jan 6, 2022

Testing it now from Ubuntu 20.04.3 LTS with the 5.4 kernel. I don't think the host docker should matter, but I don't have easy access to a ClearLinux native.

Likely something broke or changed in the script at /usr/bin/R

Results are the same today (latest clearlinux/r-lang)
image

@scotstan
Copy link
Author

scotstan commented Jan 6, 2022

Good idea on cleaning out the old images. I did that so they would pull down fresh. However, still the same problem.

I think the solution is somewhere in the bash script /usr/bin/R that sets up paths or something. I'll keep poking around.

I get great performance with R using ClearLinux. That's why this was important. I'm not blocked, but leaving this here for others.

image

@phmccarty
Copy link
Contributor

Thanks for testing. Reopening.

@phmccarty phmccarty reopened this Jan 6, 2022
@phmccarty
Copy link
Contributor

Revisiting this issue...

Neither the /usr/bin/R script nor the /usr/bin/Rscript binary's source file changed between R versions 4.0.2 and 4.0.5, so we can rule out obvious source changes affecting the behavior. I haven't yet spent much time analyzing the full source tree diff for any other clues.

Nothing stands out to me in the 4.0.3, 4.0.4, or 4.0.5 release notes as candidate breakage.

Looking the Clear Linux package history between 4.0.2 and 4.0.5, we modified config.site to change default variable assignments for AR, NM, RANLIB, and LTO, but I doubt those changes had an impact here, because nothing is being built. And I added a patch to fix the package build after we updated autoconf to 2.70, but again, this change is unlikely to blame.

I will proceed by testing this out on Ubuntu 20.04.3 and hopefully reproduce it :-)

@scotstan
Copy link
Author

scotstan commented Feb 4, 2022

I have some time now to poke at it too. I think the answer is in the R script, in that it's using an environment variable or path no longer valid or something.

@scotstan
Copy link
Author

scotstan commented Feb 6, 2022

As an FYI, R version 4.1.2 released since, so I added it to the simple test script, with the same results. Looking in to the delta from 4.0.2 to 4.0.5 where something changed...

image

@scotstan
Copy link
Author

scotstan commented Feb 6, 2022

So strange! I found the line in /usr/bin/R (same script--sha's match--across v4.0.2 and v4.0.5). Line 19 has a simple bash test for executable set on /usr/lib64/R/bin/exec, which is sucessful on 4.0.2 but not on 4.0.5(!).

v4.0.2 on the left. v4.0.5 on the right.

Turns our v4.0.2 has bash --version 5.0.18, and v4.0.5 has bash version 5.1.16.

*Did something change with test -x from bash v5.0 to v5.1? Seems unlikely, but checking.
image

image

@scotstan
Copy link
Author

scotstan commented Feb 6, 2022

@scotstan
Copy link
Author

scotstan commented Feb 7, 2022

Workaround!

Delete these lines in /usr/bin/R that check executable bit for R binary:

image

With one command:
sed -i 262,267d /usr/bin/R

image

bash 5.1

Not sure what's happening here, because none of the bash -x tests seem to work now. This will likely have big ramifications for bash scripts that test for validity of folder, execute bits, read bits, etc. Issue below has more details, but it's above my paygrade to attempt more fixes. For now, I'm good with a workaround.

alpinelinux/docker-alpine#156

This should not fail

which bash
[-x /usr/bin/bash]

@scotstan
Copy link
Author

scotstan commented Feb 7, 2022

The above just allows one to run R in interactive mode. I'm now trying to install some basic R packages (datatable, magrittr) and those now fail at /usr/lib64/R/bin/Rcmd: line 64: exec: INSTALL: not found. So the little hack above is not truly enough.

Workaround attempt: going back to last-known-stable version 4.0.2 that used bash 5.0.

@phmccarty
Copy link
Contributor

Thanks for the detailed debug findings :-)

I still have not tested Ubuntu 20.04.3, but my suspicion is that the patch to include/seccomp-syscalls.h from seccomp/libseccomp#322 (also see report seccomp/libseccomp#314) should be backported. The current Ubuntu package version is 2.5.1-1ubuntu1~20.04.2, and I don't see any backports for this specific bug.

There are a bunch of interlinked packages involved here. We have docker, runc, libseccomp, and the running kernel on the Ubuntu host, and glibc plus whatever userspace programs are running within a container based on any of the clearlinux/r-base images.

The clearlinux/r-base:4.0.2 image contains glibc 2.31, but clearlinux/r-base:4.0.5 has glibc 2.33. Support for faccessat2 was added for glibc 2.33 (see bminor/glibc@3d3ab57), so I think this is the primary reason you are seeing the image with 4.0.2 work correctly. Bash itself likely calls faccessat, resulting in glibc < 2.33 always wrapping that system call, and glibc >= 2.33 trying to use faccessat2 first, and falling back to faccessat if it's unsupported. I suspect the failure here is caused by the libseccomp bug linked above; glibc will see the wrong error code within the container environment and propagate it to bash, R, etc.

@fenrus75
Copy link
Contributor

fenrus75 commented Feb 7, 2022 via email

@scotstan
Copy link
Author

scotstan commented Feb 7, 2022

Testing on macOS (Intel) just to throw another host at this. Working as expected on macOS but not Ubuntu!

image

@thiagomacieira
Copy link

The issue is your host Docker. It's applying a seccomp filter that blocks the faccessat2 system call but makes it return an errno code that isn't ENOSYS. If it were, glibc would fall back to an earlier system call which would likely be allowed.

This is a design flaw in libseccomp. It needs to use three categories, not two:

  • permitted calls: allow
  • blocked calls: block with however the user configured them (EUCLEAN or a signal)
  • unknown calls: ENOSYS

@scotstan
Copy link
Author

scotstan commented Feb 7, 2022

Very interesting. Is there something in the host Docker configuration that controls this? I can also test on Debian Raspberry Pi. Maybe a few other hosts.

@thiagomacieira
Copy link

thiagomacieira commented Feb 8, 2022

You can disable seccomp completely ("disable security"): --security-opt seccomp=unconfined

@scotstan
Copy link
Author

scotstan commented Feb 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants