-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PyDict
iterator compatible with free-threaded build
#4439
Conversation
d90dc42
to
c0136f7
Compare
I actually have a branch with these changes (more or less) that I was planning to do separately from that PR. Unfortunately the deadlock I found is caused by something else. If you're planning to work on this stuff I'd appreciate it if you could comment on the tracking issue so we can coordinate work and avoid duplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of the critical section api seems unwise. This allows users to create several critical sections (and worse) allows them to release them in arbitrary order. I don't think I understand the critical section api well but it seems guaranteed to cause issues.
I can see two obvious solutions:
- Replace the implementation with
PyObject_GetIter
andPyIter_Next
(slow?) - Implement some form of internal iteration:
impl PyDict{
pub fn traverse<B>(&self, f: &mut impl FnMut(Bound<'py, PyAny>, Bound<'py, PyAny>) -> ControlFlow<B>) -> ControlFlow<B> {
struct Guard { .. };
impl Drop for Guard { ..release critical section }
let mut cs = std::mem::MaybeUninit::zeroed();
ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());
let mut ma_used = ..;
let mut di_used = ..;
let key = ...;
let value = ..;
while PyDict_Next(...) != 0{
f(key, value)?;
}
ControlFlow::Continue(())
}
}
src/types/dict.rs
Outdated
let cs = unsafe { | ||
let mut cs = std::mem::MaybeUninit::zeroed(); | ||
ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr()); | ||
cs.assume_init() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just assume_init immediately because it is zero-valid. This would only be necessary if you used MaybeUninit::uninit()
.
let cs = unsafe { | |
let mut cs = std::mem::MaybeUninit::zeroed(); | |
ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr()); | |
cs.assume_init() | |
}; | |
let cs: ffi::PyCriticalSection = unsafe { std::mem::MaybeUninit::zeroed().assume_init() }; | |
unsafe { ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr()) }; |
src/types/dict.rs
Outdated
#[cfg(Py_GIL_DISABLED)] | ||
impl Drop for BorrowedDictIter<'_, '_> { | ||
fn drop(&mut self) { | ||
unsafe { | ||
ffi::PyCriticalSection_End(&mut self.cs); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably implement Drop
unconditionally (or not at all)
#[cfg(Py_GIL_DISABLED)] | |
impl Drop for BorrowedDictIter<'_, '_> { | |
fn drop(&mut self) { | |
unsafe { | |
ffi::PyCriticalSection_End(&mut self.cs); | |
} | |
} | |
} | |
impl Drop for BorrowedDictIter<'_, '_> { | |
fn drop(&mut self) { | |
#[cfg(Py_GIL_DISABLED)] | |
unsafe { | |
ffi::PyCriticalSection_End(&mut self.cs); | |
} | |
} | |
} |
I think we should seriously consider going this way and benchmark if it's actually a performance concern. We already made the same change for sets a couple of releases back, it wasn't a major performance impact there compared to the wins from the Bound API. Two reasons why we did it for sets:
Similarly our current implementation doesn't respect dict subclasses with custom |
I opened #4477 with a different implementation of the FFI bindings. |
Sorry for the late reply. Your implementation looks good, and the 'opaque_type!' use is exactly like what I was looking for. I will update my MR after the weekend. |
Interesting solutions 👀. I will try to implement the first one and test the performance hit after. |
c0136f7
to
562fb4e
Compare
CodSpeed Performance ReportMerging #4439 will not alter performanceComparing Summary
|
ouch that does seem to be a big perf hit |
Yea this is really bad, but kind of expected as https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432 |
I've also looked into iterating a raw |
I wonder if the critical section API is actually problematic in practice. You could try iterating over the same dict in many threads on the free-threaded build as a stress test. I'm not sure if there are other usage patterns that @mejrs might be concerned about. It would be nice if we could still keep the fast path for dicts and then only degrade to the slow path if we're not handed an instance of PyDict_Type. |
|
src/types/dict.rs
Outdated
let tuple = pair.downcast::<PyTuple>().unwrap(); | ||
let key = tuple.get_item(0).unwrap(); | ||
let value = tuple.get_item(1).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it wise to use the unchecked variants here instead of unwrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about this, it is probably not safe, because items()
method can return an arbitrary object when overridden in python code.
I didn't know that this is different, learning something new every day! It is indeed somewhat faster. We went down from ~87% slowdown to ~63%. |
PyDict
iterator compatible with free-threaded build
I made the previous fast path available to non-freethreaded builds when the |
0b8f4c6
to
ae0ee72
Compare
src/types/dict.rs
Outdated
|
||
if unsafe { ffi::PyDict_Next(dict.as_ptr(), ppos, &mut key, &mut value) } != 0 { | ||
*remaining -= 1; | ||
let py = dict.py(); | ||
// Safety: | ||
// - PyDict_Next returns borrowed values | ||
// - we have already checked that `PyDict_Next` succeeded, so we can assume these to be non-null | ||
Some(( | ||
unsafe { key.assume_borrowed_unchecked(py) }.to_owned(), | ||
unsafe { value.assume_borrowed_unchecked(py) }.to_owned(), | ||
)) | ||
} else { | ||
None | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative implementation here where we add a critical section internally here just around the call to PyDict_Next
? It means that each iteration has to lock / unlock a mutex, which might also be terrible for performance, but it'd be interesting to try. (If it performs acceptably, we could then also ask freethreaded CPython experts if this is sound. My hunch is that it would be.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought of this implementation but as of the following issue this is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading correctly, isn't the point of that issue precisely that it's permitting us to add locking here around each call to PyDict_Next
if we so wanted? The concern about borrowed references is not relevant here because we immediately incref them, and we can do that before releasing the critical section. Cc @colesbury
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhewitt is right about the borrowed references issue not being relevant here because PyO3 would be doing it's own locking around PyDict_Next()
with incref inside the lock.
That's still not ideal, but it might be a reasonable starting point. It's much better to lock around the entire loop, both because of the performance issue and because you will see a consistent view of the dict. The locking only around PyDict_Next()
allows for concurrent modifications in between each call, so you're going to have more cases that panic due to concurrent modifications, which would have been prevented by the GIL or a loop-scoped lock.
Another alternative is to copy the dict inside the iterator and iterate over the copy. It's probably cheaper than locking around each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clearing this up! Copying the dict sounds like the easiest solution for now. When we finalize a critical section
api we can consider moving the responsibility of locking the dict (during the whole iteration) on free-threaded builds to the user and remove the copy()
and panic on concurrent modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I think if the loop executes arbitrary python code then it is still possible for the dict to be modified during iteration under a critical section, because it may be suspended by a nested section which then modifies the dict.
I feel like users are more likely to be able to know for their use case if copying or locking per iteration is more acceptable. I wonder if we need to split .iter()
into multiple methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #4571 to suggest a way forward on this.
d5a7330
to
afab9b2
Compare
afab9b2
to
3ace91a
Compare
ac9fcf9
to
f79dc05
Compare
src/types/dict.rs
Outdated
@@ -357,6 +366,24 @@ impl<'py> PyDictMethods<'py> for Bound<'py, PyDict> { | |||
BoundDictIterator::new(self.clone()) | |||
} | |||
|
|||
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))] | |
#[cfg(not(feature = "nightly"))] |
The critical section macros are a no-op on the GIL-enabled build, so I think allowing this function to work (and rely on the GIL for locking) makes it easier for users to adopt this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I also think this should be available on all builds, for simplicity.
{ | ||
#[cfg(feature = "nightly")] | ||
{ | ||
self.iter().try_for_each(|(key, value)| f(key, value)) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I see, it's because try_for_each
is implemented in terms of try_fold
(sorry for all the noise here!).
Can you add a comment explaining the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got it. I will add a comment to prevent confusion next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is shaping up nicely. Sorry for the delay on my side, I've been a bit busy / exhausted and it felt like a bigger context switch to get back here.
src/types/dict.rs
Outdated
@@ -357,6 +366,24 @@ impl<'py> PyDictMethods<'py> for Bound<'py, PyDict> { | |||
BoundDictIterator::new(self.clone()) | |||
} | |||
|
|||
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I also think this should be available on all builds, for simplicity.
/// Iterates over the contents of this dictionary while holding a critical section on the dict. | ||
/// This is useful when the GIL is disabled and the dictionary is shared between threads. | ||
/// It is not guaranteed that the dictionary will not be modified during iteration when the | ||
/// closure calls arbitrary Python code that releases the current critical section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New doc looks great 👍
I think it's potentially worth mentioning that this is roughly a performance optimization for stable where we can't optimise .iter().try_for_each()
at the moment. (And that for infallible iteration, .locked_for_each()
has no advantage over .iter().for_each()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that for infallible iteration, .locked_for_each() has no advantage over .iter().for_each().
I do not understand the second statement, could you elaborate why this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we can optimize .for_each
already on stable so if they don't need to deal with possible errors, then that's probably more optimal? Maybe I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I understand now, I will add it to the docs.
src/types/dict.rs
Outdated
if dict.is_exact_instance_of::<PyDict>() { | ||
return BoundDictIterator::DictIter { | ||
dict, | ||
ppos: 0, | ||
di_used: remaining, | ||
remaining, | ||
}; | ||
}; | ||
|
||
let items = dict.call_method0(intern!(dict.py(), "items")).unwrap(); | ||
let iter = PyIterator::from_object(&items).unwrap(); | ||
BoundDictIterator::ItemIter { iter, remaining } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would still like it split out, yes please. I think it's probably correct in the long run down that road, but I also fear that (a) no other dict methods currently respect subclasses and (b) we probably want to synchronise such a change with lists and tuple iterators too, as well as consider what other methods to adjust at the same time. (Probably in combination with a decision on #4490.)
src/types/dict.rs
Outdated
#[cfg(not(Py_GIL_DISABLED))] | ||
BoundDictIterator::DictIter { .. } => f(self), | ||
#[cfg(Py_GIL_DISABLED)] | ||
BoundDictIterator::DictIter { ref dict, .. } => { | ||
crate::sync::with_critical_section(dict.clone().as_ref(), || f(self)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this can be simplified given that with_critical_section
should optimize away trivially on the gil-enabled build, also can remove a clone:
#[cfg(not(Py_GIL_DISABLED))] | |
BoundDictIterator::DictIter { .. } => f(self), | |
#[cfg(Py_GIL_DISABLED)] | |
BoundDictIterator::DictIter { ref dict, .. } => { | |
crate::sync::with_critical_section(dict.clone().as_ref(), || f(self)) | |
} | |
BoundDictIterator::DictIter { dict } => { | |
crate::sync::with_critical_section(dict.as_any(), || f(self)) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the borrow checker will allow this because we need a mut reference to self
while iterating. So it will not let is have a second reference without Copy to get a strong ref to Bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. I think if we split the BoundDictIterator
into the main struct and a private inner
, we might be able to make it work? That's potentially a good idea anyway so that we don't have to make the enum members public, and it would be desirable to avoid the redundant .clone()
as a performance optimization :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(So probably gets simpler with #4439 (comment))
src/types/dict.rs
Outdated
|
||
#[inline] | ||
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))] | ||
fn find_map<B, F>(&mut self, f: F) -> Option<B> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the large-ish volume of new code, could we please add some tests for these? Broadly these all seem reasonable, but I would love to have some coverage of them to give me more certainty we don't break them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests.
f79dc05
to
a5ba737
Compare
a5ba737
to
e76c7b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks very much and sorry for the many rounds of slow reviews!
This pulls in the
PyCriticalSection_Begin
&PyCriticalSection_End
functions new in 3.13 and use it to lock the PyDict iterators as ddescribed here. I'm not sure about the
PyCriticalSection
struct definition. We cannot use theopaque_struct!
macro to define this struct because we have to allocate enough space on the stack so we can pass the uninitialized pointer toPyCriticalSection_Begin
. So some help would be appreciated!depends on #4421
related to #4265