-
-
Notifications
You must be signed in to change notification settings - Fork 75
Conversation
src/canary.rs
Outdated
@@ -186,3 +185,88 @@ impl Canary { | |||
} | |||
} | |||
} | |||
|
|||
/// Write [`CANARY_VALUE`] to the stack. | |||
fn paint_stack(core: &mut Core, start: u32, mut end: u32) -> Result<(), probe_rs::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a note here to observe that start
should be the numerically lower address, and end
should be the numerically higher address: that is start < end
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also panic if that is not upheld. Otherwise the subtraction at line 197 will underflow and panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does start
also have to be aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a note here to observe that
start
should be the numerically lower address, andend
should be the numerically higher address: that isstart < end
.
Good point, I am asserting this in line 250 in fn subroutine
, but it is probably better to already check it before.
Does
start
also have to be aligned?
I think so, but I was not sure how to go about it if it is not. Probably subtracting the modulo, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I am asserting this in line 250 in fn subroutine, but it is probably better to already check it before.
I'd apply the comment to whichever functions take start
and end
as arguments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is, since the initial stack pointer points to the address directly above the stack, it itself doesn't belong to the stack anymore. Can anything bad happen when we are writing to this address? (I don't think so because static variables and so on will only be created later, but I still want to double check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the stack is at the top of the RAM, the initial stack pointer value will be an invalid address.
From memory.x
in cortex-m-rt
:
_stack_start = ORIGIN(CCRAM) + LENGTH(CCRAM);
The problem, I think, is a range doing -1
when we should be doing -4
, because we work in 4-byte units. Or perhaps we shouldn't hold start and end and instead hold start and length-in-32-bit-words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so it turns out you can't see my comments unless I actually press "Comment" 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem, I think, is a range doing
-1
when we should be doing-4
, because we work in 4-byte units. Or perhaps we shouldn't hold start and end and instead hold start and length-in-32-bit-words.
So changing it to initial_stack_pointer - 4
should be fine?
I'd like to avoid bigger refactorings of other parts of the code as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 67c1576.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some minor suggestions for improving comments (specifically around document pre-conditions like address alignment). But otherwise this is brilliant! Great work :) 👍
The error messages can probably also be improved, but I'd like to do this in a follow-up PR, when reworking the canary (see "Additionally the stack canary can be simplified quite a lot. [...]". |
bors r+ |
bors cancel |
302: Make stack painting fast again! 🇪🇺 r=Urhengulas a=Urhengulas This PR implements the first one of improvements outlined in #258. Fixes #258. ## But what is "stack painting" anyways? The idea is to write a specific byte pattern to (part of) the stack before the program is getting executed. After the program finished, either because it is done with its task, or because there was an error, we read out the previously painted area and check how much of it is still intact. If the pattern is still the same, we can be rather certain that the program didn't write to this part of the stack. This information helps to either know if there was a stack overflow, or just to measure how much of the stack was used. So far both reading and writing of the memory was done via the probe. While this works it is also rather slow, because the host and probe communicate via USB which takes time. The new approach is writing a subroutine to the MCU, which will paint the memory from within. ## Mesurements In following table you can see the measurement how much time the old and new approach take for memory from 8 to 256KiB. ![data](https://user-images.githubusercontent.com/37087391/154973187-c17e66f7-cb22-4e56-8dff-a9798ab3a39a.png) The results are pretty impressive. The new approach is about 170 times faster! ## Further work - A similar approach can also be applied to reading out the stack after the program finished. - Additionally the stack canary can be simplified quite a lot. So far we are not painting the whole stack, except the user asks for it, because this _was_ slow. Because it is fast now we can always paint all of it, which simplifies the code and removes the need for the `--measure-stack` flag. Co-authored-by: Johann Hemmann <[email protected]>
Canceled. |
bors r=jonathanpallant |
Build succeeded: |
327: Optimize stack usage measuring r=jonathanpallant a=Urhengulas This PR optimizes the stack usage measuring by not using the probe but a subroutine to search through the memory. Fixes #258. ## Measurements The speedup is similarly impressive as in #302: | canary size | main (bcaf997) | this PR (3ad3f86) | | :---: | :---: | :---: | | 1024 B | 0.007s | 0.014s | | 261060 B | 1.912s | 0.028s | It makes sense that the time actually gets worse for a small canary size of 1024 bytes, since we are doing a lot of setup work (flash subroutine, set registers, set and reset program counter etc.). But we see that this totally pays off, since for a rather big canary of 256KiB we are almost 70 times faster! ## Further work This PR enables us to drastically simplify the canary logic, because since both painting and measuring are pretty fast now, we can always paint the full stack. Co-authored-by: Urhengulas <[email protected]> Co-authored-by: Johann Hemmann <[email protected]>
This PR implements the first one of improvements outlined in #258.
Fixes #258.
But what is "stack painting" anyways?
The idea is to write a specific byte pattern to (part of) the stack before the program is getting executed. After the program finished, either because it is done with its task, or because there was an error, we read out the previously painted area and check how much of it is still intact. If the pattern is still the same, we can be rather certain that the program didn't write to this part of the stack. This information helps to either know if there was a stack overflow, or just to measure how much of the stack was used.
So far both reading and writing of the memory was done via the probe. While this works it is also rather slow, because the host and probe communicate via USB which takes time.
The new approach is writing a subroutine to the MCU, which will paint the memory from within.
Mesurements
In following table you can see the measurement how much time the old and new approach take for memory from 8 to 256KiB.
The results are pretty impressive. The new approach is about 170 times faster!
Further work
--measure-stack
flag.