Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize nth and nth_back for BoundListIterator #4810

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Owen-CH-Leung
Copy link
Contributor

@Owen-CH-Leung Owen-CH-Leung commented Dec 21, 2024

See #4787

This PR optimizes nth and nth_back for BoundListIterator, and added unittest & benchmarks for these 2 APIs. Here are the benchmark of the optimized nth and nth_back:

On Stable toolchain

With Optimization

list_nth                time:   [351.21 ns 351.71 ns 352.28 ns]
                        change: [-88.294% -88.189% -88.089%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

list_nth_back           time:   [355.34 ns 356.09 ns 356.95 ns]
                        change: [-90.384% -90.343% -90.300%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

The default nth and nth_back implementation

list_nth                time:   [2.8422 µs 2.8502 µs 2.8592 µs]
                        change: [+706.38% +709.27% +712.94%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

list_nth_back           time:   [3.6509 µs 3.6557 µs 3.6611 µs]
                        change: [+919.80% +923.58% +927.06%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

On nightly toolchain

With Optimization

list_nth                time:   [290.88 ns 291.70 ns 292.71 ns]
                        change: [-0.5501% -0.2102% +0.2305%] (p = 0.27 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

list_nth_back           time:   [295.06 ns 296.29 ns 297.90 ns]
                        change: [-1.2387% -0.9560% -0.6230%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

The default nth and nth_back implementation

list_nth                time:   [2.8982 µs 2.9132 µs 2.9305 µs]
                        change: [+891.64% +897.18% +903.13%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

list_nth_back           time:   [3.6605 µs 3.6671 µs 3.6752 µs]
                        change: [+1151.3% +1154.4% +1157.3%] (p = 0.00 < 0.05)
                        Performance has regressed.

@davidhewitt
Copy link
Member

Awesome, thanks! Will seek to review soon. I guess we can do the same thing for tuple iteration?

@Owen-CH-Leung Owen-CH-Leung marked this pull request as ready for review December 21, 2024 15:47
@Owen-CH-Leung
Copy link
Contributor Author

Owen-CH-Leung commented Dec 21, 2024

Awesome, thanks! Will seek to review soon. I guess we can do the same thing for tuple iteration?

Yup - I'll file a PR to optimise tuple iterator after the holiday =)

let length = self.length.min(self.list.len());
let target_index = self.index + n;
if self.index + n < length {
let item = unsafe { self.get_item(target_index) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, is there a time-of-check to time-of-use bug here on the length? Not one for this PR, but a follow up I will try not to forget...

Copy link
Contributor Author

@Owen-CH-Leung Owen-CH-Leung Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think that's a potential TOCTOU bug here. This particular implementation assumes the user ensures proper synchronization if they intend to use the iterator in a multi-threaded or mutable environment.

Copy link
Contributor Author

@Owen-CH-Leung Owen-CH-Leung Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation of next also has a potential TOCTOU bug:

https://github.com/PyO3/pyo3/blob/main/src/types/list.rs#L495-L498

@ngoldbaum
Copy link
Contributor

Thanks for working on this! Hopefully my PR should be merged in the next few days and then you can rebase this.

@ngoldbaum
Copy link
Contributor

It might also make sense to implement advance_by on nightly. nth is implemented in terms of advance_by, so this improvement would fall out "for free" on nightly if you implement that function.

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments, I think there are some issues in how this is set up for the limited API. I also think we should define nth and nth_back for all non-nightly builds. I also don't think it's necessary to call get_item in advance_by, per my reading of the advance_by docs.

let length = length.0.min(list.len());
let target_index = index.0 + n;
if index.0 + n < length {
let item = list.get_item(target_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to actually do the get_item or check for an Err value, we only need to advance the iterator, which just requires the index.0 + n < length check.

@@ -750,6 +807,27 @@ impl<'py> Iterator for BoundListIterator<'py> {
None
})
}

#[inline]
#[cfg(all(Py_GIL_DISABLED, feature = "nightly"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(all(Py_GIL_DISABLED, feature = "nightly"))]
#[cfg(feature = "nightly")]

I think the critical section is necessary though, since another thread might try to use the list while you're in advance_by. You don't need to be in a Py_GIL_DISABLED cfg block to use a critical section, it's a no-op on the GIL-enabled build.

@@ -772,6 +850,14 @@ impl DoubleEndedIterator for BoundListIterator<'_> {
}
}

#[inline]
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))]
#[cfg(feature = "nightly")]

@@ -625,6 +674,14 @@ impl<'py> Iterator for BoundListIterator<'py> {
}
}

#[inline]
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(all(Py_GIL_DISABLED, not(feature = "nightly")))]
#[cfg(feature = "nightly")]

/// access to the list by holding a lock or by holding the innermost
/// critical section on the list.
#[inline]
#[cfg(not(Py_LIMITED_API))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also add branches to nth and nth_back that use get_item instead of get_item_unchecked on the limited API

@@ -66,6 +92,8 @@ fn sequence_from_list(b: &mut Bencher<'_>) {
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("iter_list", iter_list);
c.bench_function("list_new", list_new);
c.bench_function("list_nth", list_nth);
c.bench_function("list_nth_back", list_nth_back);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally missed this when I added the new free-threaded implementations for folding operations in list. I'll add benchmarks for those in a separate PR.

noxfile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file seem unrelated, maybe send in a separate PR so we can get these ruff fixes merged quickly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - I noticed that ruff has released 0.9.0 2 hours ago and the CI didn't pin the ruff version, hence I hit this error which doesn't exist when the latest MR is merged.

I'll file another MR to quickly format this file

@Owen-CH-Leung Owen-CH-Leung marked this pull request as draft January 10, 2025 08:22
@Owen-CH-Leung Owen-CH-Leung marked this pull request as ready for review January 10, 2025 13:35
@Owen-CH-Leung
Copy link
Contributor Author

Owen-CH-Leung commented Jan 10, 2025

@ngoldbaum Can I seek your review again ? I've adopted your suggested changes above

the CI clippy/x86_64-unknown-linux-gnu/beta is failing as main btw

@ngoldbaum
Copy link
Contributor

the CI clippy/x86_64-unknown-linux-gnu/beta is failing as main btw

GitHub's interface doesn't help by putting a big red x next to it, but this is fine and we can merge PRs as long as the "conclusion" job finishes. Right now it's broken because of an upstream bug in clippy, so it will probably stay that way until the beta channel gets the fix.

I'll try to give this PR another once-over today but failing that will get to it next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants