-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider creating a game math library benchmark for the working group #93
Comments
I'm currently working on updating benchmarks that @sebcrozet did in my own
for of mathbench to include new ultraviolet features and also to try to run
a more holistic test suite. As @bitshifter had mentioned in the main
mathbench-rs repo, there should be benchmarks both for "wide" and "scalar"
types, as both are important for different cases, so I'm trying to include
tests that might benefit both cases, as well as individual benchmarks for
each op for both cases.
…On Tue, Aug 25, 2020, 11:24 AM Ian Kettlewell ***@***.***> wrote:
Give that the working group recently took ownership of an ECS benchmark it
seems appropriate to also have a game math library benchmark. Game math
libraries are even more benchmarked and debated than ECS frameworks.
A benchmark from the working group provides a common point of reference
everyone can contribute to on neutral ground. The goal is to provide useful
information to help people make informed choices about the Rust ecosystem.
Benchmarks provided by the Working Group should aim to help people
holistically people evaluate libraries. Ideally such a benchmark also
includes metrics for compile times and perhaps lines of code (as a rough
measurement of functionality and complexity).
@bitshifter <https://github.com/bitshifter>, @sebcrozet
<https://github.com/sebcrozet>, and @termhn <https://github.com/termhn>
have all created their own benchmarks, perhaps they have thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#93>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGYXHZ5DZMVMSGFIU5FSDDSCP6UHANCNFSM4QK6PI5A>
.
|
I'd be happy for the working group to take ownership of mathbench. I think it's been useful to the community but it's usually way down my list of things to work on when I have free time so it's a bit unloved. It would be good to get the wide nalgebra/ultrawide in the same repo with the scalar benches intact as well as @termhn mentioned (see bitshifter/mathbench-rs#21). If the working group were to take ownership of the code, I think they would also need to take ownership of publishing the results and updating them periodically when existing libraries are updated or new libraries are added. I publish results to my github site https://bitshifter.github.io/mathbench/0.3.0/report/index.html, @sebcrozet and @termhn have published their own results to their own blogs/READMEs. I think it would be good if there was a central location for keeping these. The other thing to do when publishing results is update the summary in the README and document the hardware and OS used to generate them. I also make a tag when publishing the results, so it's easy to see what lib versions were used to generate them. I've consistently used the same hardware, an old laptop of mine. However that laptop doesn't support AVX-512 so it couldn't run some of the wide benchmarks. It's probably not the end of the world if hardware changed between publishing runs, but it would be better if it didn't. They take a long time to run and you can't really use the machine for anything else while they are running, which is another reason I haven't really been updating them. |
@bitshifter Can you add this information in the repo somwehere? a sort of On a more meta-level should we wait for @termhn 's proposed changes to land before moving it? |
Sure, I can document guidelines for publishing results. Hardware wise I generally run on my own laptop. I think it's useful using the same hardware each time I update it. The downside is the machine is 5 years old and doesn't have recent CPU features that some libraries want to take advantage of. I have not investigated a cloud solution. Sounds like in theory providing they can guarantee that nothing else is using resources when mathbench is running and the hardware is known and consistent. I don't know if @termhn intended to try get these changes back into mathbench or to keep them as a fork? It is probably a bit of work to get those changes back in the main repo just because they were quite extensive. On that note I recently updated mathbench to include I would still like to add wide tests. I'd like to keep them separate but have one of the scalar libs running the test for comparisons. I was possibly going to take a slightly different approach to what @sebcrozet did in his fork - which was to have a bench with say 100 elements in it and they run it through different width types, rather than having a bench for each type width, if that makes sense? I was thinking of producing separate scalar and wide summary tables. The current scalar summary table is getting pretty huge on its own. @sebcrozet's fork also added a lot of benches for types that other libraries don't generally have, which is fine, my original intention for mathbench was kind of comparison of the lowest common denominator of math library features. In some sense there's no harm in adding "exotic" features, it's just there won't be much to compare them against so maybe they're not so useful to be in "official" repo? I think there is some sense in people forking mathbench and adding benches that make sense for their library, or compiler flags that makes sense for their library. I see no harm in that. |
Oh nice... I'll probably try to "rebase" my work on top of your current mathbench then @bitshifter |
If I understand what you mean, that would mean that every type would do the same number of total iterations (and as such, wide types would be doing more total valued processed, but the same number of ops)... if so, I'm not sure I really like that way as I think it sorta obfuscates the higher throughput and makes it harder to reason about? Of course the current method isn't perfect as it's assuming you are able to start and end in wide types for your algorithm which isn't always true, but I think it's still a valid case to test (it's how I use
Yeah makes sense to me. |
No, not total number of iterations. I'm suggesting the same number of inputs are used for each type. Wider types would be doing less iterations because they are processing 4, 8 or 16 elements at a time. So say 100 single input Vec3's so glam would process 1 at a time, an 32fx4 type would process 4 and a time, 32x8 would process 8 at a time and so on. So it should make the throughput advantage or wide types clearer I think? What it doesn't show is the timing of a single function call for each wide type (like how long does a single I feel like the using the same input size would give a better example of the throughput advantage of the wider types though. I could add both single calls and throughput benches. It's just more to write and takes longer to run. |
I don't see how that is different than the way @sebcrozet implemented it (though I could just not be understanding still of course 😅) I agree with you though, afaict |
It's probably no different, I'm not super familiar with his fork :) The main thing is I would keep the existing scalar benches and wouldn't have all of the scalar types run the wide benches except for maybe 1 for comparison. Mostly because I think there's limited value in it for the scalar libs and it adds to the time to run the benches. |
One more thing... I currently have a couple benches implemented with f64 and f32 versions of wide types, but at this point I'm not sure it's actually worth it to do that to be honest. Think I'm gonna rip that out and just keep it consistently f32 across the board |
Ralith will cry |
Well, there's not gonna be benches for scalar f64 across the board anyway so 😅 as far as all the current benchmarks go, any perf trends that are true of f32s are basically true of f64s, just f64s are like 3x slower across the board or something |
I don't have a problem with dropping |
https://github.com/termhn/mathbench-rs/blob/wide/benches/eulerbench.rs Here's the approach I'm taking that I think I will just copy out to the other benches basically |
Sounds good to me. I was thinking of passing a stride to the Note that a lot of the existing benches don't take a size parameter so you'll need to make a version of them that can handle that. Fairly easy to do, just repetitious. |
opened bitshifter/mathbench-rs#24 |
Given that the working group recently took ownership of an ECS benchmark it seems appropriate to also have a game math library benchmark. Game math libraries are even more benchmarked and debated than ECS frameworks.
A benchmark from the working group provides a common point of reference everyone can contribute to on neutral ground. The goal is to provide useful information to help people make informed choices about the Rust ecosystem.
Benchmarks provided by the Working Group should aim to help people holistically people evaluate libraries. Ideally such a benchmark also includes metrics for compile times and perhaps lines of code (as a rough measurement of functionality and complexity).
@bitshifter, @sebcrozet, and @termhn have all created their own benchmarks, perhaps they have thoughts?
The text was updated successfully, but these errors were encountered: