Profiling and Performance Improvements #66

RadicalZephyr · 2024-02-18T15:56:55Z

I've added more basic benchmarks, as well as a simple profiling setup for using Coz, the causal profiler.

Specific Changes

Using Coz, the biggest potential improvement it theorized about surprisingly pointed to the time spent cloning GcNodes, specifically their String names. To ameliorate this, I created a set of enum-based names that still render the same in the logs, but are Copy instead of UTF-8 strings that need to be cloned.

Another major change was switching to the parking lot library implementations of Mutex and RwLock. This didn't have as large an impact but it does cut down on the amount of data overhead per-lock in each struct.

A number of the primitive data types that were previously wrapped in a Mutex or RwLock were switched to use relevant atomic types. I'm not sure if there was a specific that the lock data structures were chosen over atomics to begin with. I tried to use the conservative sequentially consistent ordering, I'm not sure if greater performance gains could be seen from being more precise with ordering constraints.

I also made some things more idiomatic, including avoiding cloning Rcs when it wasn't necessary. A &Rc<T> can be immutably borrowed, so unless the ownership of the new Rc handle is going to escape the current scope (typically by being stored in a Vec or a closure), then there's no need to clone.

I also noticed there was an impl dyn IsNode block to implement some methods on any object implementing the IsNode trait. This pattern can also more idiomatically be expressed using the extension trait pattern. This makes usage a bit simpler, simply using dot notation instead of the turbo fish <dyn IsNode>::... etc. I'm not sure this actually results in different code output, but it feels more idiomatic.

Micro-Benchmark Numbers

Overall, on my machine, the total performance improvements I observed from the main branch to the tip of this branch is between 8-14%.

Future Work

The current program in the coz driver crate is the prime generating filter from one of the tests. This seemed more interesting than most of the other simple tests and benchmarks, but it's still clearly not a very representative program. Adding more realistic driver programs might uncover some better optimization targets.

While there are a decent variety of benchmarks currently, again, more realistic and substantive programs would be useful. In particular there are no benchmarks currently using switch_c or switch_s.

clinuxrulz · 2024-02-19T02:44:17Z

Thank you @RadicalZephyr .
I've been away from this code for so long, I can not remember how it works. We may need a new maintainer for it.

Would you be interested?

RadicalZephyr · 2024-02-19T06:30:11Z

Yeah, absolutely. I've been trying to get a better handle on the internals, that's how I stumbled on some of these changes. I'll keep doing that I guess. :)

clinuxrulz · 2024-02-19T13:44:02Z

What is your username on crates.io ? I'll add you in as an extra owner, so you can push an updates on it as well.

RadicalZephyr · 2024-02-21T11:03:58Z

Same as on here, RadicalZephyr.

clinuxrulz · 2024-02-23T07:03:40Z

Very good. Your in.
Feel free to push a new version into crates io.
I think a new version has not been released for 2 years.

RadicalZephyr added 20 commits September 19, 2023 16:29

Clean up some syntax lints

ed47089

Upgrade criterion

cc2ccd0

Add a profiling driver program to run with Coz

6870317

Add a module for interned GcNode names

720baa0

Replace GcNode name Strings with interned data

5e052b6

Switch to using parking_lot Mutex and RwLock

bc15d54

Switch to using AtomicBool instead of RwLock<bool>

e5bb668

Change GcNode.freed to AtomicBool

e1141d8

Change GcNodeData.ref_count to AtomicU32

6af976c

Change GcNodeData.ref_count_adj to AtomicU32

73422ea

Change GcNodeData.visited to AtomicBool

edfb63e

Change GcNodeData.buffered to AtomicBool

3bfb1d2

Don't copy stream during StreamSink.send

42b33f6

Don't clone unnecessarily

11dc0df

Avoid cloning GcNode by reordering push in GcCtx::display_graph

87bd1a8

Remove unnecessary clone in GcCtx::scan_black

00379a7

Use extension trait pattern instead of impl dyn

fe41d5e

Add Copy impls for NodeNames

4f05bee

Satisfy rustfmt

2d689de

Use new Copy impl for NodeNames

92f7a2d

clinuxrulz merged commit bfa42df into SodiumFRP:master Feb 19, 2024
5 checks passed

RadicalZephyr deleted the coz-profiling branch February 19, 2024 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling and Performance Improvements #66

Profiling and Performance Improvements #66

RadicalZephyr commented Feb 18, 2024

clinuxrulz commented Feb 19, 2024

RadicalZephyr commented Feb 19, 2024

clinuxrulz commented Feb 19, 2024

RadicalZephyr commented Feb 21, 2024

clinuxrulz commented Feb 23, 2024

Profiling and Performance Improvements #66

Profiling and Performance Improvements #66

Conversation

RadicalZephyr commented Feb 18, 2024

Specific Changes

Micro-Benchmark Numbers

Future Work

clinuxrulz commented Feb 19, 2024

RadicalZephyr commented Feb 19, 2024

clinuxrulz commented Feb 19, 2024

RadicalZephyr commented Feb 21, 2024

clinuxrulz commented Feb 23, 2024