Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary location and interpreter selection issues #10855

Closed
stuhood opened this issue Sep 24, 2020 · 8 comments · Fixed by #10901
Closed

Binary location and interpreter selection issues #10855

stuhood opened this issue Sep 24, 2020 · 8 comments · Fixed by #10901
Assignees
Labels
Milestone

Comments

@stuhood
Copy link
Member

stuhood commented Sep 24, 2020

Two issues:

  1. the Find binary process that is used to locate a bootstrap interpreter limits the PATH to the search path for the tool that it is looking for. For search paths which do not include bash (which will happen in the case described on Harden PATH setting in pex runs #9760), bash will not be found, causing the script to fail with env: bash: No such file or directory - fixed in Locate bash before using it to locate BinaryPaths for other tools #10858
  2. The Find interpreter for constraints process that is used to test interpreters is not marked fallible FallibleProcessResult, and so poisoned interpreters can kill it rather than just causing the poisoned interpreters to be skipped (full error).
pex.executor.ExecutableNotFound: caught FileNotFoundError(2, "No such file or directory: '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/bin/python3'") while trying to execute 
@stuhood
Copy link
Member Author

stuhood commented Sep 25, 2020

Posted a fix for the first issue here: #10858. It should be sufficient to allow someone to work around the second issue by setting:

interpreter_search_path=["<PYENV>"]

I might have misdiagnosed the second issue though: Find interpreter for constraints is relying on PEX successfully filtering interpreters before using them, and so we should not try to recover from PEX failing to do that. And whichever code PEX is using to eliminate bad interpreters is not eliminating the problematic interpreter in this case. See the full error for the stack: https://gist.github.com/stuhood/8443ad481a65dd23b83020847ef91712

So to fix the second portion, we'd either need to make PEX resistant to this type of interpreter issue, or have pants take over more of the interpreter discovery/selection process.

stuhood added a commit that referenced this issue Sep 25, 2020
…0858)

### Problem

The first error in #10855 is that when we search a search path which does not include `bash`, lookups fail silently because the discovery script fails to start. The usecase described on #9760 (setting `interpreter_search_path=["<PYENV>"]`) is one example where `bash` will not be present.

### Solution

Recurse to locate an absolute path for `bash` before using `bash`. Additionally, change the "discovery" portion of `BinaryPaths` from `FallibleProcessResult` to `ProcessResult` to fail more quickly in cases where discovery errors, as opposed to succeeding with an empty result.

### Result

The added test passes, and the usecase from #9760 works on my machine.

[ci skip-rust]
[ci skip-build-wheels]
@cosmicexplorer
Copy link
Contributor

So to fix the second portion, we'd either need to make PEX resistant to this type of interpreter issue, or have pants take over more of the interpreter discovery/selection process.

Does this seem like an issue that affects Pex usage in general? Or is it specifically because of some weirdness around how pants invokes it here?

@stuhood
Copy link
Member Author

stuhood commented Sep 28, 2020

So to fix the second portion, we'd either need to make PEX resistant to this type of interpreter issue, or have pants take over more of the interpreter discovery/selection process.

Does this seem like an issue that affects Pex usage in general? Or is it specifically because of some weirdness around how pants invokes it here?

Unclear... it's possible that Pants is explicitly passing an interpreter that PEX would not otherwise use. But it's also possible that PEX would fail in this user's environment as well. Can follow up with the user to ask them to try using PEX directly I suppose, but will hold off on doing that unless it turns out that the stack is not sufficient to fix the issue.

@Eric-Arellano
Copy link
Contributor

it's possible that Pants is explicitly passing an interpreter that PEX would not otherwise use.

I don't think this is possible. We use Pex to do the interpreter selection still:

@rule(desc="Find Python interpreter for constraints", level=LogLevel.DEBUG)
async def find_interpreter(interpreter_constraints: PexInterpreterConstraints) -> PythonExecutable:
formatted_constraints = " OR ".join(str(constraint) for constraint in interpreter_constraints)
process = await Get(
Process,
PexCliProcess(
description=f"Find interpreter for constraints: {formatted_constraints}",
# Here, we run the Pex CLI with no requirements, which just selects an interpreter.
# Normally, this would start an isolated repl. By passing `--`, we force the repl to
# instead act as an interpreter (the selected one) and tell us about itself. The upshot
# is we run the Pex interpreter selection logic unperturbed but without resolving any
# distributions.
argv=(
*interpreter_constraints.generate_pex_arg_list(),
"--",
"-c",
# N.B.: The following code snippet must be compatible with Python 2.7 and
# Python 3.5+.
dedent(
"""\
import hashlib
import os
import sys
python = os.path.realpath(sys.executable)
print(python)
hasher = hashlib.sha256()
with open(python, "rb") as fp:
# We pick 8192 for efficiency of reads and fingerprint updates
# (writes) since it's a common OS buffer size and an even
# multiple of the hash block size.
for chunk in iter(lambda: fp.read(8192), b""):
hasher.update(chunk)
print(hasher.hexdigest())
"""
),
),
level=LogLevel.DEBUG,
),
)
result = await Get(
ProcessResult, UncacheableProcess(process=process, scope=ProcessScope.PER_SESSION)
)
path, fingerprint = result.stdout.decode().strip().splitlines()
return PythonExecutable(path=path, fingerprint=fingerprint)

@jsirois
Copy link
Contributor

jsirois commented Sep 28, 2020

Yeah, this has to be a bug in the interpreter cache somehow, I'm just not seeing it yet. OSX does something very weird here and I'm floundering a bit not having the OS in hand to repro / debug with. I'll give this one more hard throw logic out the window code scan a bit later.

@jsirois jsirois self-assigned this Sep 28, 2020
@jsirois
Copy link
Contributor

jsirois commented Sep 28, 2020

Noting that even with fixing the interpreter cache to handle moved files (and maybe handle files since made unexecutable, etc ...) we'll need to switch to a FallibleProcessResult. Without actually re-running the interpreter identification process and thus defeating the point of the interpreter cache in the 1st place, we'll never be able to detect - for example - that the interpreter has had a shared library it needs nuked, etc. IE: There is more to check than the single python binary path Pex knows about to ensure the cached interpreter package is unmodified in full from when it was cached.

Since we use an UncacheableProcess to discover interpreters subject to interpreter constraints there is no need to mark the Process fallible. If there is something correctable on the filesystem, that can be corrected and pantsd re-started.

@jsirois
Copy link
Contributor

jsirois commented Sep 28, 2020

Alright, I've found an OSX-only mechanism to trip up Pex via this Python ecosystem bug https://bugs.python.org/issue22490#msg283859 which we are vulnerable to. More here: python/cpython#9516

Re-opened an old unreproducible Pex issue that was probably dues to this same issue: pex-tool/pex#1009

I'll implement the fix over there.

@jsirois
Copy link
Contributor

jsirois commented Sep 29, 2020

I'll close this upon upgrade to Pex 2.1.17 pex-tool/pex#1013.

jsirois added a commit to jsirois/pants that referenced this issue Oct 3, 2020
This pulls in a fix for interpreter selection under macOS framework
Python builds.

Fixes #pantsbuild#10855

[ci skip-rust]
[ci skip-build-wheels]
jsirois added a commit that referenced this issue Oct 3, 2020
This pulls in a fix for interpreter selection under macOS framework
Python builds.

Fixes ##10855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants