Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

Open
emaxx-google opened this issue Dec 20, 2024 · 7 comments · May be fixed by #148
Open

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

emaxx-google opened this issue Dec 20, 2024 · 7 comments · May be fixed by #148

Comments

@emaxx-google
Copy link

emaxx-google commented Dec 20, 2024

I was able to reduce it down to the following example:

from concurrent.futures import FIRST_COMPLETED, wait
import time
import unittest

from pebble import ProcessPool
from pebble.common.types import CONSTS

def function(argument, sleep_interval):
    time.sleep(sleep_interval)
    return argument

class TestProcessPoolGeneric(unittest.TestCase):
    def test_big_values(self):
        BIG_VALUE = [0] * 1000 * 1000
        CNT = 50
        INITIAL_SLEEP = 1
        EPS = CONSTS.sleep_unit / 10

        futures = []
        with ProcessPool(max_workers=CNT) as pool:
            for i in range(CNT):
                futures.append(pool.schedule(function, args=[BIG_VALUE, INITIAL_SLEEP + i * EPS]))
            wait(futures, return_when=FIRST_COMPLETED)
            for f in futures:
                f.cancel()
            time.sleep(EPS * CNT / 2)
            pool.stop()
            pool.join()
@emaxx-google
Copy link
Author

emaxx-google commented Dec 20, 2024

If I understand correctly, the occasional 60-second delay observed in join() is essentially the LOCK_TIMEOUT from channel.py.

The root cause seems to be:

  1. pool.stop() causes the message_manager_loop thread to terminate.
  2. Still, we have active workers that try to write their (big) results to the IPC pipe. As no one reads from the pipe anymore (due to # 1), at some point one of the workers gets stuck on a send() call - while still holding the ChannelMutex.writer_mutex.
  3. In the main process, the pool_manager_loop thread is still running and tries to apply the changes from cancelled_tasks, and eventually calls into stop_worker() for the worker that's stuck above in # 2. We get a deadlock because we're trying to acquire the ChannelMutex.writer_mutex as well.
  4. On the main thread, the pool.join() call eventually joins the pool_manager_loop thread, which deadlocks the main thread as well.

The deadlock is recovered due to the LOCK_TIMEOUT delay used by ChannelMutex: after this time interval the # 3 gets unblocked and reports ChannelError which is then wrapped into BrokenProcessPool.

@emaxx-google
Copy link
Author

As for the ways to fix it, I'm thinking of maybe letting the message_manager_loop thread run a bit longer - to keep pumping the message pipe until the workers are actually shut down by PoolManager?

@emaxx-google
Copy link
Author

I've added a diagram of what I believe is happening during this hang, and also to clarify the fix I proposed:

Screenshot from 2025-01-15 14-57-15

noxdafox added a commit that referenced this issue Jan 25, 2025
noxdafox added a commit that referenced this issue Jan 26, 2025
Signed-off-by: Matteo Cafasso <[email protected]>
@noxdafox
Copy link
Owner

noxdafox commented Jan 26, 2025

Hello,

this is a known limitation and it's covered within these test cases.

As the documentation states, Python multiprocessing guidelines fully apply to Pebble as well. In particular, I am addressing the following section:

Avoid shared state

    As far as possible one should try to avoid shifting large amounts of data between processes.

The reason why this is an anti-patter is due to few reason.

  1. It defeats the performance advantages of parallel processing. As the pool workers share a single pipe they will contend it. If the main loop is not fast enough maintaining the pipe empty, your workers will be starving wasting lots of CPU cycles.
  2. Python pipes uses pickle as serialization format for data which might not be optimal (or even functional) for certain data structures.
  3. It can leads to corner cases as the one you are facing.

The recommended way to handle these situations is to use files rather than sending large amount of data back-and-forth between the pipe. What is observed in most of the times is a significant increase of performance within the application itself. (Some references: 1, 2).

Your main loop and your workers would write down the data in dedicated files and only share between each others the files path. This allows an optimal usage of the pipe (an empty pipe is a happy pipe) and leads to few benefits:

  • I/O in modern computers is way more scalable than mutually exclusive memory
  • You can choose your serialization format for the data resulting in potentially faster I/O
  • In case of error, you can inspect the files left behind to better troubleshoot your application
  • You can significantly reduce the amount of memory consumed by your application as you have direct control of what you load from the files (lazy reads and whatnot)

That said, I do agree that the current implementation is sub-optimal as it's thought around the usage recommendations. Yet the solution you propose cannot be accepted due to security reasons. The expectation over the stop method is that the pool can be reliably stopped upon request. It does not matter if in some cases it might take up to 60 seconds to do so. The main loop can trust that the pool will eventually stop.

Waiting for the pipe to be drained does not provide the same guarantees. A malicious worker process could keep slowly feeding the pipe literally locking the main loop indefinitely.

Hence, a better solution should be devised. Indeed when we ask the pool to stop (compared to the close method), we imply that we don't care about any further computation anymore. This knowledge could be propagated within the components making sure we don't need to lock the internal pipe for example.

I will come with a proper implementation in the upcoming days. In the meantime I have 2 recommendations for you:

  1. You follow the above suggestions and implement a file-based I/O instead of delegating to the pipe the work.
  2. I just added a commit which exposes the mutex timeout as a configurable value. If you can not proceed with 1, you can de-crease the mutex timeout as per will at the moment until a better solution is not implemented.

@emaxx-google
Copy link
Author

emaxx-google commented Jan 27, 2025

Thank you for the detailed comment. I'll look into the workarounds you mentioned.

However I have a question regarding the concern you expressed regarding the PR:

Yet the #148 cannot be accepted due to security reasons. The expectation over the stop method is that the pool can be reliably stopped upon request. It does not matter if in some cases it might take up to 60 seconds to do so. The main loop can trust that the pool will eventually stop.

Waiting for the pipe to be drained does not provide the same guarantees. A malicious worker process could keep slowly feeding the pipe literally locking the main loop indefinitely.

My understanding is that the PR doesn't introduce waiting for the pipe to be drained. The pipe draining is still executed on the background thread, and this thread's lifetime is only prolonged to (roughly) until PoolManager.stop() completes - which, as before, should complete with the 60-second safeguards at the worst case.

I've also updated the PR to add a unit test for the situation you described (a spammy worker) - if I understood it correctly.

@emaxx-google
Copy link
Author

Regarding your other suggestion on large amount of data: I've double-checked the application where the problem was originally observed (marxin/cvise#41), and the sizes of the messages are actually moderate: up to 2 KB. Still, the problem is observed with standard parallelization settings (e.g., 64 workers on my machine). Probably the specified deadlock scenario is bounded by 64 KiB divided by the number of workers - https://github.com/python/cpython/blob/a8dc6d6d44a141a8f839deb248a02148dcfb509e/Lib/multiprocessing/connection.py#L43 ? Asking the library consumer to use file-based I/O even for a kilobyte of data seems like an overkill.

@noxdafox
Copy link
Owner

noxdafox commented Feb 2, 2025

2Kb is definitely too small to be the issue in there but it seems the issue in the linked ticket is not due to that. The user had a dual-socket workstation and this might be a different matter.

The side-effect with the stuffed pipe is known but I've never stumbled across issues with termination of the processes with small files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants