-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancing Parallelism in PyO3: Exploring Multi-Process Architecture Over Sub-Interpreters #3479
Comments
This is pretty much what the
They do, subinterpreters can run independently.
I think the best option is some kind of closure based api like what |
In my concept, we will not send them; instead, we will inject them. I have tested this idea using PyO3, and it not only works but also significantly improves the code efficiency in my application. However, since we cannot use multiple truly parallelized instances, we are still limited to using a single processing unit, which can change if we allocate a session for each one.
Are you sure? In the thread-state-and-the-global-interpreter-lock section of the C-API documentation, we can observe the following paragraph:
This indicates that we cannot guarantee a safe state of PyGIL when working with multiple subinterpreters. Another point is that the session they run on top of has a master GIL, meaning that while we can allow concurrency, everything will still depend on the same GIL. Also, the same page states:
This implies that even if we create threads on the Python side, we are still relying on one GIL. When I wrote the enhancement issue, I extensively read the Docs in the Python C-API and delved deep into the C-API. What I realized is this: even if we use subinterpreters, we are still relying on one interpreter. I have this problem in native Python; only processes have an independent GIL and can deeply explore hardware resources. And here, in subinterpreters, we see that they are planning to have a "per-interpreter GIL." If they have already done this, then I can agree with you that this issue isn't necessary. However, I can't find any information stating that each subinterpreter has a dedicated GIL. If they don't, then a mechanism using sessions (processes) will work better for large intensive tasks that hold the GIL for a long time, such as machine learning, large operations per unit, etc. I'm not saying that your point is completely wrong. Indeed, subinterpreters will increase efficiency in concurrency, but for true parallelism, we need to rely on processes. Since PEP 3099 states that GIL will not be removed from Python 3.x, we need an alternative to utilize all the capabilities that the hardware has to offer. For that, I will continue to conduct my "Frankenstein" experiments on top of PyO3 FFI, hoping that I achieve something like this. Regarding:
I have tested something like this in my crate, RustPyNet, which is also in my profile as a public project. Here, I try to use a mechanism similar to rayon's to allow multiple sessions of PyO3 and have multiple interpreters in each one. It is in an experimental state, but I think it may work. I acknowledge that I might be saying something incorrect here since the docs are really really extensive and the discussion about GIL already exists for some years by now, but based on what I have read, I see these facts that make me think that while subinterpreters are good for concurrency, for truly parallelized things and large intensive CPU operations, they will still present limitations originating from GIL. |
Nice to know about this! But this is only in 3.12, right? I understand that we need to focus on the future since the versions we currently work on will be deprecated soon, if we don't manage to find a solution for earlier versions as well, many programs that can't be updated yet will be outside this feature's scope. If it ends up being like that, I will continue with RustPyNet and add a session-based Frankenstein of the FFI to allow older versions to have it. I personally have a Network-Based IPC Framework Lib built on top of PyO3 that needs real parallelism for better performance for a private finance market application and some other Telecom things, and I simply can't rely only on py > 3.12 because a lot of machine learning stuff is in 3.7.9 - 3.10. I understand your point, this is a very more practical approach, but at least where I read here in the ref that you sent, it will only work in the newer versions. This means that until the rest of the libs indeed update, a lot of things that can't be updated to py > 3.12 will stay out of this mod.. |
Personally I think that a full multiprocess framework is an application-specific problem of high complexity which most users of PyO3 don't need. I can understand the value it brings to you and encourage you to continue to publish and support RustPyNet while it solves the problem for you and others. I just don't think there is justification for adding this complexity to PyO3. Subinterpreter support is, on the other hand, something which most PyO3 users may want for their extension modules even if they don't actually need subinterpreters at all (their users might). The 3.12 per-interpreter GIL then also becomes a natural addition for Rust users looking for Python parallelism. Therefore I think the correct way to make progress is by solving #576. While it is true that 3.12 is still unsupported by many projects, this will change in the time it take us to make changes here. I also think that when |
I now understand your point about it. PyO3 is a crate designed to interface Rust with Python, acting like a pipe. It allows sending things from Rust to Python and vice versa, executing commands and receiving responses. Essentially, you all prefer not to add too much complexity to it because projects built on top of it can be slowed down if this pipe has excessive complexity in this interlanguage union, especially for library development and also most uses will not need all this support because most modules don't need it. Given that, I will take your suggestion, which you also recommended some days ago, and continue with RustPyNet. The purpose is to bridge this gap for users who need to maximize hardware utility, like the examples I mentioned, and also to allow for backward compatibility, since David suggested that as well. Let's stay in touch. I'm collaborating on a fork with @Aequitosh, and we are trying to add support for sub-interpreters. Now that I'm more aligned with the goals of PyO3, I'm confident that something productive can emerge from this. Meanwhile, I will enhance RustPyNet as I gain more knowledge about CPython and PyO3's FFI module. In fact, the references in the ffi module that you asked to open a PR and implement in the sub-interpreters issue seem to be already where they should be in the ffi module's pylifecycle.rs file And then after realise that, we move to the planing of the mechanism to comport sub-interpreters and what it will impact on. So I think this issue is closed, now that I understand better the goals that you guys have to PyO3 I will try to focus more on the direction that you guys are going especially with the sub interpreters, also considering the idea of no GIL that can be a good one. If you guys want to close this issue, or anything like that, feel free to do so! I hope that soon I can bring new in the topic of the sub-interpreter and help to add this support to PyO3! Tks for the clarification and also for the good Docs sent in this conversation! |
Thanks @letalboy. As per above, I will close this issue for now. I look forward to hearing both about RustPyNet further and what discoveries you make regarding what we can do about sub-interpreters. 👍 |
Hey team,
I've recently conducted some in-depth research on our project. After discussing with @Aequitosh and going through the ffi API extensively—particularly the sections that deal with interaction with the C part of the code—I've made some observations. It appears that the memory references:
lack implementations for Py_NewInterpreter and Py_EndInterpreter
are already present in ffi. However, as I delved deeper into the Rust code and its interface with CPython, I realized that even if we implement sub-interpreters, they wouldn't support true multi-threading. So, I wondered: why not employ multiple processes on the Rust side? We could assign each process a secure GIL connection and synchronize them using an IPC-channel on the Rust side to ensure safety. Here's a sample code that demonstrates this idea:
During my experiments, where I tried combining various parts of the currently implemented ffi to achieve multiple compilers, I noticed that while sub-interpreters indeed enhance performance, they don't offer genuine parallelism. However, the approach I've suggested above might!
I've seen some libraries utilize this multi-process model, and I believe it's feasible. Furthermore, this could align well with the current implementation of "relax." By ensuring only one GIL connection per process, we can prevent contention over the Global Interpreter Lock. We have a couple of options:
Centralize execution and modify #[py_function] to schedule a Python task in a pool with multiple sessions. Additionally, we could create a macro to wrap code inside functions, allowing us to execute Python code without having to transfer Python objects between threads—a known limitation.
Retain the current coding style but introduce a mechanism to import a session connection inside a thread. This lock would essentially send the code to execute in the pool and then return the result.
While the first option might entail significant changes and require considerable effort, the second might be relatively straightforward.
These insights aim to enhance our crate's overall efficiency. By adopting such an approach, libraries related to machine learning, biochemistry analysis, and deep space analysis algorithms could benefit from safer multithreading. This could potentially expedite numerous research projects and save millions or even billions in processing costs.
Of course, I acknowledge that I don't possess the extensive experience that you core mainteiners have about PyO3 code yet. Consequently, I'm unsure if everything I've proposed is both feasible and practical for PyO3. However, I believe we can concur that if executed correctly, this could be a game-changer.
The text was updated successfully, but these errors were encountered: