Skip to content

Commit

Permalink
RFC: CFI Improvements with PAuth and BTI
Browse files Browse the repository at this point in the history
Improve control flow integrity for compiled WebAssembly code
by utilizing two technologies from the Arm instruction set
architecture - Pointer Authentication and Branch Target
Identification.

Copyright (c) 2021, Arm Limited.
  • Loading branch information
akirilov-arm committed Oct 21, 2021
1 parent 2821d03 commit 749d230
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions accepted/cfi-improvements-with-pauth-and-bti.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Summary
[summary]: #summary

This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two
technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target
Identification.

# Motivation
[motivation]: #motivation

The [security model of WebAssembly](https://webassembly.org/docs/security/) ensures that Wasm
modules execute in a sandboxed environment isolated from the host runtime. One aspect of that model
is that it provides implicit control flow integrity (CFI) by forcing all function call targets to
specify a valid entry in the function index space, by using a protected call stack that is not
affected by buffer overflows in the module heap, and so on. As a result, in some Wasm applications
the runtime is able to execute untrusted code safely. However, that places the burden of ensuring
that the security properties are upheld on the compiler to a large extent.

On the other hand, a further aspect of the WebAssembly design is efficient execution (close to
native speed), which leads to a natural tendency towards sophisticated optimizing compilers.
Unfortunately, the additional complexity increases the risk of implementation problems and in
particular compromises of the security properties. For example, Cranelift has been affected by
issues such as CVE-2021-32629 [cve] that could make it possible to access the protected call stack
or memory that is private to the host runtime.

We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as
expanding fuzzing and making it possible to apply formal verification to at least some parts of the
compilation process. However, it is also reasonable to consider a defense in depth strategy and to
evaluate mitigations for potential future issues.

Finally, Wasmtime can be used as a library and in particular embedded into an application that is
implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that
case the compiled WebAssembly code could provide convenient instruction sequences for attacks that
subvert normal control flow and that originate from the embedder's code, even if Cranelift and
Wasmtime themselves lack any defects.

[cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5

# Proposal
[proposal]: #proposal

Currently this proposal focuses on the AArch64 execution environment.

## Background

The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e.
provides back-edge CFI. It is described in section D5.1.5 of
[the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP`
instructions when executed by a processor that does not support the extension.

The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is
provides forward-edge CFI and is described in section D5.4.4. A processor implementation with BTI
would support PAuth as well, but not necessarily vice versa. Whether BTI applies to an executable
memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad"
for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for
processors that do not support BTI).

Both extensions are applicable only to the AArch64 execution state and are optional, so each CFI
technique would be employed only if the target environment provides the necessary ISA support.
Wasmtime embedders need to consider a subtlety - if they cache the result of the check, that may
happen to be located in memory that could be potentially accessible to an attacker, so the latter
could disable the use of PAuth and BTI in subsequent code generation. Mitigating this issue is
outside the scope of this proposal.

The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] provides an introduction
to the technologies.

[arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en
[code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story

## Improved back-edge CFI with PAuth

The proposed implementation will add the `PACIASP` instruction to the beginning of every function
compiled by Cranelift and would replace the final return with the `RETAA` instruction.

In environments that use the DWARF format for unwinding the implementation would be modified to
apply the `DW_CFA_AARCH64_negate_ra_state` operation immediately after the `PACIASP` instruction.

These steps can be skipped for simple leaf functions that do not construct frame records on the
stack.

## Enhanced forward-edge CFI with BTI

The proposed implementation will add the `BTI j` instruction to the beginning of every basic block
that is the target of an indirect branch and that is not a function prologue. Note that in the
AArch64 backend generated function calls always target function prologues and indirect branches that
do not act like function calls appear only in the implementation of the `br_table` IR operation.
Function prologues would be covered by the pointer authentication instructions, which also act as
landing pads - as discussed before, BTI support implies Pauth.

During development one simple way to create a working prototype is to add the landing pads to the
beginning of every basic block, irrespective of whether it is the target of an indirect branch or
not. In this way it can be checked if BTI causes any issue with the rest of the runtime.

## CFI improvements to assembly, C, C++, and Rust code

Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of
this proposal, but in general it should be achievable by passing the appropriate parameters to the
respective compiler.

Functions implemented in assembly will get a similar treatment as generated code, i.e. they will
start with the `PACIASP` instruction. However, the regular return will be preserved and instead will
be preceded by the `AUTIASP` instruction. The reason is that both `AUTIASP` and `PACIASP` act as
`NOP` instructions when executed by a processor that does not support PAuth, thus making the
assembly code generic.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Since the existing implementation already uses the standard back-edge CFI techniques that are
preferred in the absence of special hardware support (i.e. a separate protected stack that is not
used for buffers that could be accessed out of bounds), the alternative is not to implement the
proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code
size the impact of the back-edge CFI improvements is an additional instruction per function, or 2
for functions implemented in assembly.

The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the
forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch
to check if its destination is permitted. While the overhead of this approach can be reduced by
using efficient data structures for the destination address lookup and optionally limiting the
checks only to indirect function calls, it is still significantly larger than the worst-case BTI
overhead of one instruction per basic block per function. On the other hand, it does not require any
special hardware support, so it could be applied to all supported platforms.

[clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html

# Open questions
[open-questions]: #open-questions

- What is the performance overhead of the proposal?
- What technologies are available in other instruction set architectures to achieve the same goals?
- What hardening approaches are applicable to the fiber implementation? The fiber switching code
saves the values of all callee-saved registers on the stack, i.e. memory that is potentially
accessible to an attacker. Some of those values could be code addresses that would be used by
indirect branches, so should we devise a scheme to authenticate them? While the regular pointer
authentication instructions assume that they are operating on valid virtual addresses (which implies
that the most significant bits are redundant and could be repurposed), PAuth provides operations to
authenticate arbitrary data, which could be used in this case.
- Should we generate the operations that act as `NOP` instructions unconditionally instead (while
still choosing the shorter alternative sequences if the target supports them)? That would
especially help the ahead of time compilation use case, and could arguably reduce the amount of
testing, i.e. no need to check both with and without CFI enhancements.

0 comments on commit 749d230

Please sign in to comment.