Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling and Performance Improvements #66

Merged
merged 20 commits into from
Feb 19, 2024

Conversation

RadicalZephyr
Copy link
Contributor

I've added more basic benchmarks, as well as a simple profiling setup for using Coz, the causal profiler.

Specific Changes

Using Coz, the biggest potential improvement it theorized about surprisingly pointed to the time spent cloning GcNodes, specifically their String names. To ameliorate this, I created a set of enum-based names that still render the same in the logs, but are Copy instead of UTF-8 strings that need to be cloned.

Another major change was switching to the parking lot library implementations of Mutex and RwLock. This didn't have as large an impact but it does cut down on the amount of data overhead per-lock in each struct.

A number of the primitive data types that were previously wrapped in a Mutex or RwLock were switched to use relevant atomic types. I'm not sure if there was a specific that the lock data structures were chosen over atomics to begin with. I tried to use the conservative sequentially consistent ordering, I'm not sure if greater performance gains could be seen from being more precise with ordering constraints.

I also made some things more idiomatic, including avoiding cloning Rcs when it wasn't necessary. A &Rc<T> can be immutably borrowed, so unless the ownership of the new Rc handle is going to escape the current scope (typically by being stored in a Vec or a closure), then there's no need to clone.

I also noticed there was an impl dyn IsNode block to implement some methods on any object implementing the IsNode trait. This pattern can also more idiomatically be expressed using the extension trait pattern. This makes usage a bit simpler, simply using dot notation instead of the turbo fish <dyn IsNode>::... etc. I'm not sure this actually results in different code output, but it feels more idiomatic.

Micro-Benchmark Numbers

Overall, on my machine, the total performance improvements I observed from the main branch to the tip of this branch is between 8-14%.

Screenshot from 2024-02-18 06-53-37
Screenshot from 2024-02-18 06-53-50
Screenshot from 2024-02-18 06-54-00

Future Work

The current program in the coz driver crate is the prime generating filter from one of the tests. This seemed more interesting than most of the other simple tests and benchmarks, but it's still clearly not a very representative program. Adding more realistic driver programs might uncover some better optimization targets.

While there are a decent variety of benchmarks currently, again, more realistic and substantive programs would be useful. In particular there are no benchmarks currently using switch_c or switch_s.

@clinuxrulz clinuxrulz merged commit bfa42df into SodiumFRP:master Feb 19, 2024
5 checks passed
@clinuxrulz
Copy link
Collaborator

Thank you @RadicalZephyr .
I've been away from this code for so long, I can not remember how it works. We may need a new maintainer for it.

Would you be interested?

@RadicalZephyr RadicalZephyr deleted the coz-profiling branch February 19, 2024 06:19
@RadicalZephyr
Copy link
Contributor Author

Yeah, absolutely. I've been trying to get a better handle on the internals, that's how I stumbled on some of these changes. I'll keep doing that I guess. :)

@clinuxrulz
Copy link
Collaborator

What is your username on crates.io ? I'll add you in as an extra owner, so you can push an updates on it as well.

@RadicalZephyr
Copy link
Contributor Author

Same as on here, RadicalZephyr.

@clinuxrulz
Copy link
Collaborator

Very good. Your in.
Feel free to push a new version into crates io.
I think a new version has not been released for 2 years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants