-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling and Performance Improvements #66
Conversation
Thank you @RadicalZephyr . Would you be interested? |
Yeah, absolutely. I've been trying to get a better handle on the internals, that's how I stumbled on some of these changes. I'll keep doing that I guess. :) |
What is your username on crates.io ? I'll add you in as an extra owner, so you can push an updates on it as well. |
Same as on here, RadicalZephyr. |
Very good. Your in. |
I've added more basic benchmarks, as well as a simple profiling setup for using Coz, the causal profiler.
Specific Changes
Using Coz, the biggest potential improvement it theorized about surprisingly pointed to the time spent cloning
GcNode
s, specifically theirString
names. To ameliorate this, I created a set of enum-based names that still render the same in the logs, but areCopy
instead of UTF-8 strings that need to be cloned.Another major change was switching to the parking lot library implementations of
Mutex
andRwLock
. This didn't have as large an impact but it does cut down on the amount of data overhead per-lock in each struct.A number of the primitive data types that were previously wrapped in a
Mutex
orRwLock
were switched to use relevant atomic types. I'm not sure if there was a specific that the lock data structures were chosen over atomics to begin with. I tried to use the conservative sequentially consistent ordering, I'm not sure if greater performance gains could be seen from being more precise with ordering constraints.I also made some things more idiomatic, including avoiding cloning
Rc
s when it wasn't necessary. A&Rc<T>
can be immutably borrowed, so unless the ownership of the newRc
handle is going to escape the current scope (typically by being stored in aVec
or a closure), then there's no need to clone.I also noticed there was an
impl dyn IsNode
block to implement some methods on any object implementing theIsNode
trait. This pattern can also more idiomatically be expressed using the extension trait pattern. This makes usage a bit simpler, simply using dot notation instead of the turbo fish<dyn IsNode>::...
etc. I'm not sure this actually results in different code output, but it feels more idiomatic.Micro-Benchmark Numbers
Overall, on my machine, the total performance improvements I observed from the main branch to the tip of this branch is between 8-14%.
Future Work
The current program in the coz driver crate is the prime generating filter from one of the tests. This seemed more interesting than most of the other simple tests and benchmarks, but it's still clearly not a very representative program. Adding more realistic driver programs might uncover some better optimization targets.
While there are a decent variety of benchmarks currently, again, more realistic and substantive programs would be useful. In particular there are no benchmarks currently using
switch_c
orswitch_s
.