Make stack painting fast again! 🇪🇺 #302

Urhengulas · 2022-02-21T14:29:50Z

This PR implements the first one of improvements outlined in #258.

Fixes #258.

But what is "stack painting" anyways?

The idea is to write a specific byte pattern to (part of) the stack before the program is getting executed. After the program finished, either because it is done with its task, or because there was an error, we read out the previously painted area and check how much of it is still intact. If the pattern is still the same, we can be rather certain that the program didn't write to this part of the stack. This information helps to either know if there was a stack overflow, or just to measure how much of the stack was used.

So far both reading and writing of the memory was done via the probe. While this works it is also rather slow, because the host and probe communicate via USB which takes time.

The new approach is writing a subroutine to the MCU, which will paint the memory from within.

Mesurements

In following table you can see the measurement how much time the old and new approach take for memory from 8 to 256KiB.

The results are pretty impressive. The new approach is about 170 times faster!

Further work

A similar approach can also be applied to reading out the stack after the program finished.
Additionally the stack canary can be simplified quite a lot. So far we are not painting the whole stack, except the user asks for it, because this was slow. Because it is fast now we can always paint all of it, which simplifies the code and removes the need for the --measure-stack flag.

jonathanpallant · 2022-02-22T09:18:08Z

src/canary.rs

@@ -186,3 +185,88 @@ impl Canary {
        }
    }
 }
+
+/// Write [`CANARY_VALUE`] to the stack.
+fn paint_stack(core: &mut Core, start: u32, mut end: u32) -> Result<(), probe_rs::Error> {


I'd add a note here to observe that start should be the numerically lower address, and end should be the numerically higher address: that is start < end.

Maybe also panic if that is not upheld. Otherwise the subtraction at line 197 will underflow and panic.

Does start also have to be aligned?

I'd add a note here to observe that start should be the numerically lower address, and end should be the numerically higher address: that is start < end.

Good point, I am asserting this in line 250 in fn subroutine, but it is probably better to already check it before.

Does start also have to be aligned?

I think so, but I was not sure how to go about it if it is not. Probably subtracting the modulo, right?

Good point, I am asserting this in line 250 in fn subroutine, but it is probably better to already check it before.

I'd apply the comment to whichever functions take start and end as arguments :)

My only concern is, since the initial stack pointer points to the address directly above the stack, it itself doesn't belong to the stack anymore. Can anything bad happen when we are writing to this address? (I don't think so because static variables and so on will only be created later, but I still want to double check).

If the stack is at the top of the RAM, the initial stack pointer value will be an invalid address.

From memory.x in cortex-m-rt:

_stack_start = ORIGIN(CCRAM) + LENGTH(CCRAM);

The problem, I think, is a range doing -1 when we should be doing -4, because we work in 4-byte units. Or perhaps we shouldn't hold start and end and instead hold start and length-in-32-bit-words.

OK so it turns out you can't see my comments unless I actually press "Comment" 🤣

The problem, I think, is a range doing -1 when we should be doing -4, because we work in 4-byte units. Or perhaps we shouldn't hold start and end and instead hold start and length-in-32-bit-words.

So changing it to initial_stack_pointer - 4 should be fine?
I'd like to avoid bigger refactorings of other parts of the code as part of this PR.

See 67c1576.

src/canary.rs

jonathanpallant

I had some minor suggestions for improving comments (specifically around document pre-conditions like address alignment). But otherwise this is brilliant! Great work :) 👍

Urhengulas · 2022-02-22T11:26:15Z

The error messages can probably also be improved, but I'd like to do this in a follow-up PR, when reworking the canary (see "Additionally the stack canary can be simplified quite a lot. [...]".

Urhengulas · 2022-02-25T09:48:20Z

bors r+

Urhengulas · 2022-02-25T09:48:35Z

bors cancel

302: Make stack painting fast again! 🇪🇺 r=Urhengulas a=Urhengulas This PR implements the first one of improvements outlined in #258. Fixes #258. ## But what is "stack painting" anyways? The idea is to write a specific byte pattern to (part of) the stack before the program is getting executed. After the program finished, either because it is done with its task, or because there was an error, we read out the previously painted area and check how much of it is still intact. If the pattern is still the same, we can be rather certain that the program didn't write to this part of the stack. This information helps to either know if there was a stack overflow, or just to measure how much of the stack was used. So far both reading and writing of the memory was done via the probe. While this works it is also rather slow, because the host and probe communicate via USB which takes time. The new approach is writing a subroutine to the MCU, which will paint the memory from within. ## Mesurements In following table you can see the measurement how much time the old and new approach take for memory from 8 to 256KiB. ![data](https://user-images.githubusercontent.com/37087391/154973187-c17e66f7-cb22-4e56-8dff-a9798ab3a39a.png) The results are pretty impressive. The new approach is about 170 times faster! ## Further work - A similar approach can also be applied to reading out the stack after the program finished. - Additionally the stack canary can be simplified quite a lot. So far we are not painting the whole stack, except the user asks for it, because this _was_ slow. Because it is fast now we can always paint all of it, which simplifies the code and removes the need for the `--measure-stack` flag. Co-authored-by: Johann Hemmann <[email protected]>

bors · 2022-02-25T09:48:38Z

Canceled.

Urhengulas · 2022-02-25T09:49:03Z

bors r=jonathanpallant

bors · 2022-02-25T09:55:52Z

Build succeeded:

ci

327: Optimize stack usage measuring r=jonathanpallant a=Urhengulas This PR optimizes the stack usage measuring by not using the probe but a subroutine to search through the memory. Fixes #258. ## Measurements The speedup is similarly impressive as in #302: | canary size | main (bcaf997) | this PR (3ad3f86) | | :---: | :---: | :---: | | 1024 B | 0.007s | 0.014s | | 261060 B | 1.912s | 0.028s | It makes sense that the time actually gets worse for a small canary size of 1024 bytes, since we are doing a lot of setup work (flash subroutine, set registers, set and reset program counter etc.). But we see that this totally pays off, since for a rather big canary of 256KiB we are almost 70 times faster! ## Further work This PR enables us to drastically simplify the canary logic, because since both painting and measuring are pretty fast now, we can always paint the full stack. Co-authored-by: Urhengulas <[email protected]> Co-authored-by: Johann Hemmann <[email protected]>

Urhengulas added 2 commits February 21, 2022 15:07

Make stack-painting faster!

e21a6c3

Avoid allocating vector for subroutine

73040b3

Urhengulas requested a review from japaric February 21, 2022 14:29

Urhengulas assigned japaric Feb 21, 2022

jonathanpallant reviewed Feb 22, 2022

View reviewed changes

src/canary.rs Show resolved Hide resolved

jonathanpallant reviewed Feb 22, 2022

View reviewed changes

src/canary.rs Outdated Show resolved Hide resolved

jonathanpallant approved these changes Feb 22, 2022

View reviewed changes

Urhengulas added 4 commits February 22, 2022 10:56

Assert start < end earlier

6c1aa1d

Extend comments inside fn subroutine

73f8111

Minor fixes

cea80d3

Solve alignment issues

4331ea1

Urhengulas requested a review from jonathanpallant February 22, 2022 15:22

Urhengulas assigned jonathanpallant and unassigned japaric Feb 22, 2022

Urhengulas removed the request for review from japaric February 22, 2022 15:23

Stack end 4 bytes below initial SP, not 1

67c1576

Urhengulas requested review from jonathanpallant and removed request for jonathanpallant February 23, 2022 13:37

jonathanpallant approved these changes Feb 23, 2022

View reviewed changes

bors bot merged commit 12228a0 into main Feb 25, 2022

bors bot deleted the stack-painting branch February 25, 2022 09:55

Urhengulas mentioned this pull request Jul 22, 2022

Optimize stack usage measuring #327

Merged

Urhengulas mentioned this pull request Feb 5, 2023

Failure to detect heap #360

Open

Urhengulas mentioned this pull request Jun 13, 2023

Simplify canary #410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make stack painting fast again! 🇪🇺 #302

Make stack painting fast again! 🇪🇺 #302

Urhengulas commented Feb 21, 2022

jonathanpallant Feb 22, 2022

jonathanpallant Feb 22, 2022

jonathanpallant Feb 22, 2022

Urhengulas Feb 22, 2022

jonathanpallant Feb 22, 2022

Urhengulas Feb 22, 2022

jonathanpallant Feb 22, 2022

jonathanpallant Feb 22, 2022

Urhengulas Feb 23, 2022

Urhengulas Feb 23, 2022

jonathanpallant left a comment

Urhengulas commented Feb 22, 2022

Urhengulas commented Feb 25, 2022

Urhengulas commented Feb 25, 2022

bors bot commented Feb 25, 2022

Urhengulas commented Feb 25, 2022

bors bot commented Feb 25, 2022

Make stack painting fast again! 🇪🇺 #302

Make stack painting fast again! 🇪🇺 #302

Conversation

Urhengulas commented Feb 21, 2022

But what is "stack painting" anyways?

Mesurements

Further work

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathanpallant left a comment

Choose a reason for hiding this comment

Urhengulas commented Feb 22, 2022

Urhengulas commented Feb 25, 2022

Urhengulas commented Feb 25, 2022

bors bot commented Feb 25, 2022

Urhengulas commented Feb 25, 2022

bors bot commented Feb 25, 2022