-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-100227: Move the Dict of Interned Strings to PyInterpreterState #102339
gh-100227: Move the Dict of Interned Strings to PyInterpreterState #102339
Conversation
ericsnowcurrently
commented
Feb 28, 2023
•
edited by bedevere-bot
Loading
edited by bedevere-bot
- Issue: Improve Interpreter Isolation #100227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is wrong, it breaks the invariant that interned strings are global, with this they are per interpreter. When a string is interned, its interned bit is set to 1 and if it is shared across interpreters like if it's immortal then it breaks the invariant that interned bit set means that the string is really interned.
The correct solution for this is to let the interned dict be global and add a fine grained lock around mutating it which IIRC only happens in two places.
When you're done making the requested changes, leave the comment: |
Thanks for the feedback. I want to be sure we're in agreement before I proceed on this.
Are you talking about process-global or something else? Process-global is problematic for interpreter isolation, especially under a per-interpreter GIL. Non-immortal str objects must not be shared between interpreters, which means many strings that are currently interned must either be made immortal or not be interned. Making the interned dict per-interpreter solves this.
All immortal strings are still interned in each interpreter. However, non-immortal string instances are necessarily unique to a single interpreter and consequently interning them to just that interpreter is still correct. The trade-off here is that we incur an extra CPU cost to startup for subinterpreters (due to the static strings) and an extra memory/CPU cost when the same (non-static) string is interned in additional interpreters. I expect there is a more efficient way to solve this, but that doesn't mean we can't go with a less efficient approach (as I've proposed) in the meantime. Note that, most importantly, there is no impact to the status quo for the main interpreter in either case. That matters since the vast majority of Python usage does not involve subinterpreters currently.
Such a lock must also guard reads since they might race with mutation. That leads to a larger performance impact. |
I'll see it by next weekend. |
BTW, benchmarks show no significant change in performance. |
Hmm, I am thinking of the opposite, make it mandatory that all interned strings are immortal. If we guarantee this then we don't need to make it per interpreter, the main interpreter will clear the interned table at runtime finalization. This has other benefits that it fits out current mental model that an interned string is a process wide global unique string and shared across interpreter as it is currently done.
Yeah, all of this is for subinterpreters. This is how with immortal objects, I think this should happen:
To me this seems much simpler to understand and implement. I feel we should table this for now until immortal objects are implemented in main branch. |
I see what you're saying. Furthermore, this wouldn't be so big a deal since you can't [normally] intern a string directly from Python code, so it isn't a huge number of strings that get interned. I'll let this stew in my brain for a day or two and get back to this. Thanks for explaining! |
This does mean that, under a per-interpreter GIL, we'd need a granular global for any use of the interned dict, but that's okay. It also means that we'd have to ensure resizing is done with the main interpreter's obmalloc state, but that is manageable. |
@kumaraditya303, does gh-102858 seem like what you had in mind? |
I'm closing this in favor of gh-102925, which keeps the interned strings dict global. |
…ted Interpreters (pythongh-102925)" This reverts commit 87be8d9.
The keep-it-global approach has turned out to be quite complex. We can look into that further, but in the meantime I'm planning on going with the simpler approach here. |
…ate (pythongh-102339) We can revisit the options for keeping it global later, if desired. For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime. python#100227