-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sanitize python env and subprocess i/o #16736
Conversation
whithout those all PYTHON* would interfere
I'm a little worried about removing these from the environment since you can imagine some users wanting to set PYTHONPATH to point to a location where they have install some python modules (for example the ones in |
indeed it absolutely does not cover all cases, it's just sane default to provide basic module compiling for CPython 3.11+ for building pygame on wasm with https://github.com/pygame-web/python-wasm-sdk , emcc em++ and cc have finally turned to something more complicated and not easily relocatable like : #!/bin/bash
unset _PYTHON_SYSCONFIGDATA_NAME
unset PYTHONHOME
unset PYTHONPATH
SHARED=""
IS_SHARED=false
for arg do
shift
# that is for some very bad setup.py behaviour regarding cross compiling. should not be needed ..
[ "\$arg" = "-I/usr/include" ] && continue
[ "\$arg" = "-I/usr/include/SDL2" ] && continue
[ "\$arg" = "-L/usr/lib64" ] && continue
[ "\$arg" = "-L/usr/lib" ] && continue
if [ "\$arg" = "-shared" ]
then
IS_SHARED=true
SHARED="$SHARED -sSIDE_MODULE"
fi
if echo "\$arg"|grep -q wasm32-emscripten.so\$
then
IS_SHARED=true
SHARED="$SHARED -shared -sSIDE_MODULE"
fi
set -- "\$@" "\$arg"
done
if $IS_SHARED
then
$SYS_PYTHON -E $0.py $SHARED -L/opt/python-wasm-sdk/devices/emsdk/usr/lib "$@"
else
$SYS_PYTHON -E $0.py $SHARED -I/opt/python-wasm-sdk/devices/emsdk/usr/include "$@"
fi where SYS_PYTHON is set to stable/system python ( should be f"python{PYMAJOR}.{PYMINOR-1}" when building alpha/beta and "new" python / python3 ( 3.11 here) are in the path while building modules ) for py in 10 9 8 7
do
if command -v python3.${py} >/dev/null
then
export SYS_PYTHON=$(command -v python3.${py})
break
fi
done SYS_PYTHON was added because EMSDK_PYTHON is cleared by emsdk_env.sh ( why ? ) also emsdk/upstream/emscripten/system/bin does not seem in the PATH so sdl2-config from host often gets grabbed instead of the emsdk sysroot one / target install prefix ( here /opt/python-wasm-sdk/devices/emsdk/usr ) |
I think this change would break Pyodide's package compilation. We could probably find a workaround though. |
In particular, we need to explicitly control |
I don't think this change stops you from using You don't need that environment variable to be set within the compiler/emcc do you? |
No, I guess not, we don't need them inside of the compiler wrapper. I guess this change actually makes a lot of sense because it defends |
em++
Outdated
@@ -11,6 +11,10 @@ | |||
# To make modifications to this file, edit `tools/run_python_compiler.sh` and | |||
# then run `tools/create_entry_points.py` | |||
|
|||
unset _PYTHON_SYSCONFIGDATA_NAME | |||
unset PYTHONHOME | |||
unset PYTHONPATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need these two lines if you are using -E
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think -E does not clear the env it just disregard the values, so better safe than sorry later as these 2 are really critical if another python child process is launched without -E
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then in that case can you remove the -E
? There is no point in adding it if we have another method of doing that same thing and we think that we cannot inject -E
into all the right place. That would be -E
as partial, and redundant solution here, right?
Maybe add a comment to that effect. Something like "Unset these environment variables explicitly, rather than using python -E
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-E covers (a lot!) of other variables that should not be impacted for emsdk, the two unset are really usefull failsafe for subprocesses where I can't be sure -E will be set ( i did not read all emsdk source code).
tldr :
-
-E ensure fully proper operation of emsdk base tool called in first place.
-
the unset HOME/PATH ensure minimal proper operation of anything that could spawn as subprocesses of that emcc tool over those the shell script has no direct control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what else -E
covers? I guess the doc is quite open ended:
-E Ignore environment variables like PYTHONPATH and PYTHONHOME that modify the behavior
of the interpreter.
There are indeed a couple of places where emscripten spawns internal stuff using sys.interpreter
. With this change, those would run in partially-clean environment rather then that fully cleaned -E
one? It seems odd that -E
would not also cover python sub-processes. It also seems odd that we would consider partially-cleaned OK in some circumstances but not OK in others.
One example of emcc launching a python subprocess:
Lines 1435 to 1440 in 4549d9d
sourcemap_cmd = [PYTHON, path_from_root('tools/wasm-sourcemap.py'), | |
wasm_file, | |
'--dwarfdump=' + LLVM_DWARFDUMP, | |
'-o', map_file, | |
'--basepath=' + base_path] | |
check_call(sourcemap_cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually all system tools on good distribution should already set -E but again ... can't be 100% sure ( it was worst in the past there are probably still remnants or obsolete systems that cannot upgrade for some reason ) so it's better emconfigure sanitize by default instead of expecting all children to do it on their own.
PYTHONPATH or PYTHONHOME should never influence anything under configure/make/cmake ( or any system tool ! ) ever and they should use a stable system python.
When the (auto)tools need to configure for/with another python they (should) provide a parameter to designate the targeted interpreter/tool for run or getting config data. This is where a given shell script wrapper should clear env then set PYTHON*, PYTHONPATH=/required/by/mytool and run the target correctly ( exactly like the "virtualenv" in python world does ) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But PYTHONPATH=/required/by/mytool ./configure
does work today for cases where configure run some python script as part of configuring.
This change would break that no? Same for PYTHONPATH=/required/by/mytool emmake
.
It doesnt seem like it should be our job to sanitize the environment for all our subprocesses, especially in the case of emmake/emcmake/emconfigure where the subprocess are use controlled and the user has a reasonable expectation that the environment they set will be passed to the children of make/cmake/configure.
So... I suggest we start with just using -E
.. since that is clearly correct, and revisit that if it turns out that doesn't solve your issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we should add -E
to the windows .bat
files for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But PYTHONPATH=/required/by/mytool ./configure does work today for cases where configure run some python script as part of configuring.
imho using PYTHON* that way is bad because configure is not a python script.
PYTHON* is for the end of the chain : the tool py script itself. not for influencing autotools that is thousands of lines that run thousands of tools .
revisit that if it turns out that doesn't solve your issue
well actually it's not specifically my issue, it's more the various hard to solve potential issues that will come of thousands of modules around.
That we may hope to cross compile to wasm without patching emsdk ( pyodide still do that and iirc even rebuild it fully) and without having thousands of PR to sanitize subprocesses or helper tools we have not yet heard of.
That in the case python-wasm-sdk come to interest more than just pygame ( less than 3minutes to build the wasm python module without docker, i'm sure it will get traction ! )
And maybe we should add -E to the windows .bat files for consistency?
i don't know anymore how system python runs on windows or mac systems (since 2.7), i think you should but i'm certainly not the one to ask if you can safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho using PYTHON* that way is bad because configure is not a python script.
It might be in-advised but that doesn't mean we should break anyone who want to use it.
well actually it's not specifically my issue, it's more the various hard to solve potential issues that will come of thousands of modules around.
That we may hope to cross compile to wasm without patching emsdk ( pyodide still do that and iirc even rebuild it fully) and without having thousands of PR to sanitize subprocesses or helper tools we have not yet heard of.
I think that number of python subprocess that get run by emcc should be very limited, we should be able to keep track of them and ensure they all get run with -E.
i don't know anymore how system python runs on windows or mac systems (since 2.7), i think you should but i'm certainly not the one to ask if you can safely.
I think we should add -E
everywhere (including on windows) for consistency. It could be confusing otherwise. We have a fair amount of CI testing to ensure it works, so you don't need to worry about locally testing it.
no. setup.py should naturally call the python in the PATH ( like in venv) that should be tweaked/wrapped for cross compiling and handling all funny stuff that happens with setup.py, while all emsdk tools should use EMSDK_PYTHON ( which most likely is to be the stable system python ( please without devel headers, setup.py craves for them ! )) it's especially annoying to have
or to have to use the replay system from pyodide that use too much heuristic and is not always 32 bits aware, and did i mention docker ? |
Ok, it sounds reasonable to leave Perhaps add a comment such as: |
i think @tiran could say more on that he loves cross compiling ;) is check_call() is really the only other place -E is required ? do unset PYTHONHOME and unset PYTHONPATH go away ? |
I think it should be anywhere we use
Yes please. |
This was nothing more than an alias for sys.executable. While this is used a fair amount in testing, emscripten itself should not be forking python subprocesses itself, but instead using the launcher scripts. There is one place in emscripten where we do fork a python subprocess directly (when we run tools/wasm-sourcemap.py) and I'm not sure its worth adding a wrapper for that minor tool. For that one case I've added `-E` to match the changes we are planning to the launcher scripts: #16736
This was nothing more than an alias for sys.executable. While this is used a fair amount in testing, emscripten itself should not be forking python subprocesses itself, but instead using the launcher scripts. There is one place in emscripten where we do fork a python subprocess directly (when we run tools/wasm-sourcemap.py) and I'm not sure its worth adding a wrapper for that minor tool. For that one case I've added `-E` to match the changes we are planning to the launcher scripts: #16736
Do you mind if I re-title this PR something like |
please do as you wish |
This was nothing more than an alias for sys.executable. While this is used a fair amount in testing, emscripten itself should not be forking python subprocesses itself, but instead using the launcher scripts. There is one place in emscripten where we do fork a python subprocess directly (when we run tools/wasm-sourcemap.py) and I'm not sure its worth adding a wrapper for that minor tool. For that one case I've added `-E` to match the changes we are planning to the launcher scripts: #16736
While working with python 3.11 wasm and pip modules, emcc is often called as subprocess from a tweaked python.
i encountered some troubles, emcc.py is inheriting some unwanted values from parent shell
to name a few _PYTHON_SYSCONFIGDATA_NAME and PYTHONHOME / PYTHONPATH
adding -E will prevent PYTHON* to interfere, and the unset _PYTHON_SYSCONFIGDATA_NAME is very unlikely to be used anywhere except in cpython modules cross compiling.