Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-88050: Fix asyncio subprocess kill process cleanly when process is blocked #32073

Merged
merged 25 commits into from
Oct 5, 2022

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Mar 23, 2022

This PR fixes the issue by calling the waiters after all the callbacks are executed so that no callbacks are scheduled after the method returns.

@kumaraditya303 kumaraditya303 force-pushed the fix-asyncio-subprocess branch from 98f282f to 485b26b Compare March 23, 2022 09:36
@kumaraditya303 kumaraditya303 marked this pull request as ready for review March 23, 2022 10:26
@asvetlov asvetlov changed the title bpo-43884: Fix asyncio subprocess kill process cleanly when process i… bpo-43884: Fix asyncio subprocess kill process cleanly when process is blocked Mar 23, 2022
@asvetlov
Copy link
Contributor

What's wrong with Windows?

@kumaraditya303 kumaraditya303 changed the title bpo-43884: Fix asyncio subprocess kill process cleanly when process is blocked gh-88050: Fix asyncio subprocess kill process cleanly when process is blocked Apr 17, 2022
@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Oct 5, 2022

I still have only a cursory understanding (or recollection...) of this part of asyncio, and I worry that we're applying "programming by random modification". How confident are you that you understand what goes on here, rather than just observing the tests pass?

This is the version I am most confident with. I wrote a summary about this with and without fix which would hopefully help you in reviewing. I have spent weeks on trying to get the best fix for this and this is the best I could come up with.

There are so many win32 checks in the test... Maybe it would be better to have two versions of the test, guarded by @skipIf platform Win or not?

Regarding windows, if you see the tests in this file, you'll see that there are literally no tests testing killing of subprocess with shell=True i.e. asyncio.create_subprocess_shell. This is the first test which tests it, I won't be surprised if these tests were intentionally omitted because no-one wanted deal with subprocess shell on Windows. (From my experience shell and console is worst on Windows).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the summary! This looks like a clear improvement so let's merge it.

@gvanrossum gvanrossum merged commit 7015e13 into python:main Oct 5, 2022
@miss-islington
Copy link
Contributor

Thanks @kumaraditya303 for the PR, and @gvanrossum for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-97915 is a backport of this pull request to the 3.11 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 5, 2022
…rocess is blocked (pythonGH-32073)

(cherry picked from commit 7015e13)

Co-authored-by: Kumar Aditya <[email protected]>
@bedevere-bot
Copy link

GH-97916 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Oct 5, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 5, 2022
…rocess is blocked (pythonGH-32073)

(cherry picked from commit 7015e13)

Co-authored-by: Kumar Aditya <[email protected]>
@kumaraditya303 kumaraditya303 deleted the fix-asyncio-subprocess branch October 5, 2022 17:16
miss-islington added a commit that referenced this pull request Oct 5, 2022
… is blocked (GH-32073)

(cherry picked from commit 7015e13)

Co-authored-by: Kumar Aditya <[email protected]>
carljm added a commit to carljm/cpython that referenced this pull request Oct 6, 2022
* main: (66 commits)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
  pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
  pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
  pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
  pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
  pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
  pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
  pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
  pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
  Add re.VERBOSE flag documentation example (python#97678)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Oct 8, 2022
* main: (53 commits)
  pythongh-94808: Coverage: Test that maximum indentation level is handled (python#95926)
  pythonGH-88050: fix race in closing subprocess pipe in asyncio  (python#97951)
  pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962)
  pythongh-95986: Fix the example using match keyword (python#95989)
  pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944)
  pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929)
  pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610)
  pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703)
  pythonGH-88968: Add notes about socket ownership transfers (python#97936)
  pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  ...
mpage pushed a commit to mpage/cpython that referenced this pull request Oct 11, 2022
@vstinner
Copy link
Member

AMD64 Ubuntu Shared 3.x buildbot started to fail recently with Out-of-Memory (OOM) errors: Python test processes are killed with SIGKILL by the Linux kernel.

David Bolen, owner of the buildbot worker, wrote that there's currently an awful lot (340) of processes like: (...)/python -c import time; time.sleep(100000000). I guess that these processes come from test_kill_issue43884() of test_asyncio.test_subprocess (Lib/test/test_asyncio/test_subprocess.py). He added: "Looks like they started accumulating sporadically on Oct 2".

Maybe this change supposed to kill asyncio processes more cleanly is now making the situation worse? Or maybe it's a different regression?

I ran the following command in 3 terminals on Linux to stress test the test:

./python -m test -m test_kill_issue43884 -j40 test_asyncio test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio  test_asyncio

While tests are running, I see Python processes running time.sleep(100000000). But once tests complete, I don't see these processes anymore.

Maybe the issue was fixed in the meanwhile? I don't know.

@db3l
Copy link
Contributor

db3l commented Oct 18, 2022

So, there are already a new set of hung processes on my buildbot since I killed everything a short while ago. There's an active 3.11 build going on, but the asyncio test already completed, so it sounds like those processes should be gone. There are two clusters (3 processes) about 16 minutes apart, so I'm guessing one from the last build and one from this one. test_asyncio did run near the end of the last build and near the beginning of the current one.

As soon as I have some free cycles, I'll try some manual tests on the buildbot to see if I can reproduce. But in the meantime I'll monitor to ensure they don't accumulate enough to interfere again with subsequent builds.

@vstinner
Copy link
Member

I created issue #98407: "AMD64 Ubuntu Shared 3.x: python processes killed with SIGKILL by Linux Out-of-Memory (OOM), maybe related to test_asyncio".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants