You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Guess what, the worlds fastest scheduler just got a tiny bit faster :)
This approach should merge nicely with Goodbye Exclusive#17 and mut Resources#14.
Edit: Typos and clarification, added disassembly and updated Notes.md.
Notes for lock optimization
Idea
Current implmentation always reads and writes BASEPRI on entry/exit of an interrupt (this is done by the cortex-m-rtfm/src/export::run which is a trampoline to execute the actual task).
Using this approch, we are reading BASEPRI if and only if we are actually changing BASEPRI.
On restoring BASEPRI (in lock) we chose to restore the original BASEPRI value if we at the outmost nesting level (initial priority of the task). In this way, we can avoid unnecessary BASEPRI accesses, and reduce register pressure.
If you want to play around checkout the lockopt branch and use:
We extend cortex-m-rtfm/src/export::Priority with additional fields to store init_logic (priority of the task) and old_basepri_hw. The latter field is initially None on creation.
// Newtype over `Cell` that forbids mutation through a shared referencepubstructPriority{init_logic:u8,current_logic:Cell<u8>,#[cfg(armv7m)]old_basepri_hw:Cell<Option<u8>>,}implPriority{#[inline(always)]pubunsafefnnew(value:u8) -> Self{Priority{init_logic: value,current_logic:Cell::new(value),old_basepri_hw:Cell::new(None),}}#[inline(always)]fnset_logic(&self,value:u8){self.current_logic.set(value)}#[inline(always)]fnget_logic(&self) -> u8{self.current_logic.get()}#[inline(always)]fnget_init_logic(&self) -> u8{self.init_logic}#[cfg(armv7m)]#[inline(always)]fnget_old_basepri_hw(&self) -> Option<u8>{self.old_basepri_hw.get()}#[cfg(armv7m)]#[inline(always)]fnset_old_basepri_hw(&self,value:u8){self.old_basepri_hw.set(Some(value));}}
The corresponding lock is implemented as follows:
#[cfg(armv7m)]#[inline(always)]pubunsafefnlock<T,R>(ptr:*mutT,priority:&Priority,ceiling:u8,nvic_prio_bits:u8,f:implFnOnce(&mutT) -> R,) -> R{let current = priority.get_logic();if current < ceiling {if ceiling == (1 << nvic_prio_bits){
priority.set_logic(u8::max_value());let r = interrupt::free(|_| f(&mut*ptr));
priority.set_logic(current);
r
}else{match priority.get_old_basepri_hw(){None => priority.set_old_basepri_hw(basepri::read()),
_ => (),};
priority.set_logic(ceiling);
basepri::write(logical2hw(ceiling, nvic_prio_bits));let r = f(&mut*ptr);if current == priority.get_init_logic(){
basepri::write(priority.get_old_basepri_hw().unwrap());}else{
basepri::write(logical2hw(priority.get_logic(), nvic_prio_bits));}
priority.set_logic(current);
r
}}else{f(&mut*ptr)}}
The highest priority is achieved through an interrupt_free and does not at all affect the BASEPRI. Thus it manipulates only the "logic" priority (used to optimize out locks).
For the normal case, on enter we check if the BASEPRI register has been read, if not we read it and update old_basepri_hw. On exit we check if we should restore a logical priority (inside a nested lock) or to restore the BASEPRI (previously stored in old_basepri_hw).
Safety
We can safely unwrap the get_old_basepri_hw: Option<u8> as the path leading up to the unwrap passes an update to Some or was already Some. Updating get_old_basepri_hw is monotonic, the API offers no way of making get_old_basepri_hw into None (besides new).
Moreover new is the only public function of Priority, thus we are exposing nothing dangerous to the user. (Externally changing old_basepri_hw could lead to memory unsafety, as an incorrect BASEPRI value may allow starting a task that should have been blocked, and once started access to resources with the same ceiling (or lower) is directly granted under SRP).
Implementation
Implementation mainly regards two files, the rtfm/src/export.rs (discussed above) and macros/src/codegen/hardware_tasks.rs. For the latter the task dispatcher is updated as follows:
Basically we create Priority (on stack) and use that to create a Context. The beauty is that LLVM is completely optimizing out the data structure (and related code), but taking into account its implications to control flow. Thus, the locks AND initial reading of BASEPRI will be optimized at compile time at Zero cost.
Overall, using this approach, we don't need a trampoline (run). We reduce the overhead by at least two machine instructions (additional reading/writing of BASEPRI) for each interrupt. It also reduces the register pressure (as less information needs to be stored).
Evaluation
The examples/lockopt.rs shows that locks are effectively optimized out.
GPIOB/C are sharing a resource (C higher prio). Notice, for GPIOC there is no BASEPRI manipulation at all.
For GPIOB, there is a single read of BASEPRI (stored in old_basepri_hw) and just two writes, one for entering critical section, one for exiting. On exit we detect that we are indeed at the initial priority for the task, thus we restore the old_basepri_hw instead of a logic priority.
Hi Folks.
Guess what, the worlds fastest scheduler just got a tiny bit faster :)
This approach should merge nicely with
Goodbye Exclusive
#17 andmut Resources
#14.Edit: Typos and clarification, added disassembly and updated Notes.md.
Notes for lock optimization
Idea
Current implmentation always reads and writes BASEPRI on entry/exit of an interrupt (this is done by the
cortex-m-rtfm/src/export::run
which is a trampoline to execute the actual task).Using this approch, we are reading BASEPRI if and only if we are actually changing BASEPRI.
On restoring BASEPRI (in
lock
) we chose to restore the original BASEPRI value if we at the outmost nesting level (initial priority of the task). In this way, we can avoid unnecessary BASEPRI accesses, and reduce register pressure.If you want to play around checkout the
lockopt
branch and use:We extend
cortex-m-rtfm/src/export::Priority
with additional fields to storeinit_logic
(priority of the task) andold_basepri_hw
. The latter field is initiallyNone
on creation.The corresponding
lock
is implemented as follows:The highest priority is achieved through an
interrupt_free
and does not at all affect theBASEPRI
. Thus it manipulates only the "logic" priority (used to optimize out locks).For the normal case, on enter we check if the BASEPRI register has been read, if not we read it and update
old_basepri_hw
. On exit we check if we should restore a logical priority (inside a nested lock) or to restore the BASEPRI (previously stored inold_basepri_hw
).Safety
We can safely
unwrap
theget_old_basepri_hw: Option<u8>
as the path leading up to theunwrap
passes an update toSome
or was alreadySome
. Updatingget_old_basepri_hw
is monotonic, the API offers no way of makingget_old_basepri_hw
intoNone
(besidesnew
).Moreover
new
is the only public function ofPriority
, thus we are exposing nothing dangerous to the user. (Externally changingold_basepri_hw
could lead to memory unsafety, as an incorrect BASEPRI value may allow starting a task that should have been blocked, and once started access to resources with the same ceiling (or lower) is directly granted under SRP).Implementation
Implementation mainly regards two files, the
rtfm/src/export.rs
(discussed above) andmacros/src/codegen/hardware_tasks.rs
. For the latter the task dispatcher is updated as follows:Basically we create
Priority
(on stack) and use that to create aContext
. The beauty is that LLVM is completely optimizing out the data structure (and related code), but taking into account its implications to control flow. Thus, the locks AND initial reading of BASEPRI will be optimized at compile time at Zero cost.Overall, using this approach, we don't need a trampoline (
run
). We reduce the overhead by at least two machine instructions (additional reading/writing of BASEPRI) for each interrupt. It also reduces the register pressure (as less information needs to be stored).Evaluation
The
examples/lockopt.rs
shows that locks are effectively optimized out.Old Implementation
With lock opt. We see a 20% improvement for short/small tasks.
GPIOB/C are sharing a resource (C higher prio). Notice, for GPIOC there is no BASEPRI manipulation at all.
For GPIOB, there is a single read of BASEPRI (stored in
old_basepri_hw
) and just two writes, one for entering critical section, one for exiting. On exit we detect that we are indeed at the initial priority for the task, thus we restore theold_basepri_hw
instead of a logic priority.Limitations and Drawbacks
None spotted so far.
Observations
Neither give assembly dump with symbols (very annoying to rely on
arm-none-eabi-objdump
for proper objdumps), maybe just an option is missing?The text was updated successfully, but these errors were encountered: