Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized to usize #57

Closed
wants to merge 4 commits into from
Closed

Standardized to usize #57

wants to merge 4 commits into from

Conversation

valarauca
Copy link
Contributor

@valarauca valarauca commented Jan 24, 2024

Remove instances of i64 & isize when indexing, standardized on usize

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Per #56 (comment)

Attempting to reducing the scope of the PR.

This only addresses moving indexing variables to usize.

valarauca and others added 2 commits January 24, 2024 13:16
Remove instances of i64 & isize when indexing, standardized on usize
@michaelkirk
Copy link
Member

Could you run cargo fmt and fix the new clippy errors?

@valarauca
Copy link
Contributor Author

Squashing locally broke because of the upstream commits, idk how to fix that.

Clippy demanded I do a lot of things outside of the scope of the original change

const COEFF: [f64; 18] = [
-1.0, 6.0, -16.0, 32.0, -9.0, 64.0, -128.0, 2048.0, 9.0, -16.0, 768.0, 3.0, -5.0, 512.0,
-7.0, 1280.0, -7.0, 2048.0,
];
let eps2 = sq(eps);
let mut d = eps;
let mut o = 0;
for l in 1..=geodesic_order {
for (l, v) in c.iter_mut().enumerate().take(geodesic_order + 1).skip(1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can really consider this line an improvement in aggregate, but who am I to argue with a paperclip? 😆

@michaelkirk
Copy link
Member

michaelkirk commented Jan 25, 2024

Hmmm, I'm actually seeing a perf regression with this PR

$ cargo bench --bench="*" --  --baseline=main-2024-01-24
direct (c wrapper)/default
                        time:   [23.918 µs 23.956 µs 24.004 µs]
                        change: [+0.2632% +0.4314% +0.5961%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

direct (rust impl)/default
                        time:   [30.104 µs 30.139 µs 30.180 µs]
                        change: [+0.2469% +0.4175% +0.5814%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

inverse (c wrapper)/default
                        time:   [44.610 µs 44.664 µs 44.723 µs]
                        change: [-2.4267% -0.9378% +0.0711%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  1 (1.00%) high severe

inverse (rust impl)/default
                        time:   [78.354 µs 78.452 µs 78.564 µs]
                        change: [+5.8279% +6.0227% +6.2028%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

If I revert the clippy/format commit (877afcc) I see both an improvement and a regression:

$ cargo bench --bench="*" --  --baseline=main-2024-01-24

direct (c wrapper)/default
                        time:   [23.948 µs 24.010 µs 24.088 µs]
                        change: [+0.1347% +0.3540% +0.5856%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

direct (rust impl)/default
                        time:   [29.044 µs 29.085 µs 29.126 µs]
                        change: [-3.2355% -3.0412% -2.8588%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

inverse (c wrapper)/default
                        time:   [44.673 µs 44.745 µs 44.820 µs]
                        change: [-2.3927% -0.9049% +0.1063%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

inverse (rust impl)/default
                        time:   [77.100 µs 77.238 µs 77.404 µs]
                        change: [+4.1364% +4.3271% +4.5071%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild

Any idea why?

@valarauca
Copy link
Contributor Author

If I revert the clippy/format commit (877afcc) I see both an improvement and a regression:
Any idea why?

My best guess is the change is within the noise threshold. I commonly see +/-3µs changes without a code change locally (even with n=1000).

I've taken steps to try to reduce this noise (closing applications, changing cpu throttling/power options) but it remains frustratingly common result.

@michaelkirk
Copy link
Member

Noise is a real thing, but the consistency with which I can reproduce the relative behavior leads me to believe this is not only noise in the benchmark.

  • d8bbc65 - (HEAD) Standardized to usize (20 hours ago)
  • direct: -3% improvement vs main
  • inverse: +4% regression vs main
  • 877afcc - (HEAD) Make cargo fmt & cargo clippy happy (16 hours ago)
  • no (or very small) change in direct vs main
  • +5% regression in inverse vs main

Are you able to reproduce this?

@michaelkirk
Copy link
Member

The regressions from the clippy/fmt commit seem to be from changing the loop iteration.

e.g. this branch https://github.com/georust/geographiclib-rs/tree/mkirk/usize (as of 20a2b33) has the same perf characteristics as before your clippy changes.

That is:

  • direct: -3% improvement vs main
  • inverse: +4% regression vs main

@michaelkirk michaelkirk mentioned this pull request Jan 25, 2024
1 task
@michaelkirk
Copy link
Member

Thanks! I included this in #58 (but reverted part of the clippy changes).

@valarauca
Copy link
Contributor Author

Are you able to reproduce this?

Sort of...

Looking at the ASM output link

The main different I see is the normal index based loop does a pretty simple structure

BELOW_18 -> POLY_SUM (inner loop) -> BELOW_18 OR  RET

While the clippy version does

 IDK -> BELOW_18 -> POLY_SUM (inner loop) -> IDK | RETURN_OR_PANIC

This makes the clippy version extremely branch heavy, as it seems to frequently branch off check a bunch of stuff, then return to the inner loop.

@michaelkirk
Copy link
Member

Interesting! That might account for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants