Introduce `pte_flags` crate with support for `aarch64` and `x86_64` #696

NathanRoyer · 2022-11-17T14:39:33Z

This introduces a replacement for entryflags_x86_64 which supports aarch64 and x86_64; support for other architectures can be added easily.

Currently the only documentation is on following methods of MappedPageAttributes:

valid
writeable
executable
cacheable
global
exclusive

That documentation describes the semantics of the corresponding flags. Feel free to suggest other things to document.

…4` & `x86_64`

kevinaboos

Thanks for this PR!

There are a few design issues with this:

The main point of this redesign is for you (and me jointly) to determine which flags are shared across the two architectures, and also how to express x86-specific terminology/concepts like "global", "cacheable", and "writeback/write-through" flags in aarch64-specific terms (which may also depend on how we intend to use them in Theseus). For example, x86 has a single concept of "global" or not, while aarch64 has a different concept of "shareable". The purpose of this is to determine how they are similar or different and to have a single
The existing conversion routine from "software" flags to hardware flags is quite slow and overly complex, making it hard to understand what the similarities and differences are across the architectures.
There aren't dedicated structs that define the arch-specific PTE flags, making it hard to understand the actual physical layout of PTE flags on each architecture. The main idea was to have 3 structs: one "software" (arch-generic), one specific to aarch64, and one specific to x86_64. If aarch64 and x86_64 were similar enough, the top-level arch-generic "software" PTE flags could be theoretically removed entirely, though it'd be nice to have that top-level flag struct to make future architectures easier to integrate. Then, the idea would be to implement conversion routines to and from those structs.
- This is also required in case we need to set a arch-specific flag bit that doesn't have a corresponding representation in the "software" flags or on another arch.
Converting from hardware flags to "software" flags isn't supported, but that's a fundamental requirement since we sometimes need to read and interpret the actual flag values from the hardware PTEs. This is also the reason for the above point about needing actual concrete struct definitions for the arch-specific PteFlags
Please take my descriptive comments from the existing EntryFlags and use that as a model for these. The aarch64 PteFlags should include equally detailed comments
(minor) these flags are used in MappedPages, yes, but they don't strictly have anything to do with that, so the name shouldn't reflect that. PteFlags or PagingFlags is better.

What i had in mind after our chat was a significantly different design (described above); I can either mock it up now or wait for you to take a second crack at it first, but be sure to deeply consider the above points and also how others would use your new types.

NathanRoyer · 2022-11-18T13:09:21Z

The main point of this redesign is for you (and me jointly) to determine which flags are shared across the two architectures, and also how to express x86-specific terminology/concepts like "global", "cacheable", and "writeback/write-through" flags in aarch64-specific terms (which may also depend on how we intend to use them in Theseus). For example, x86 has a single concept of "global" or not, while aarch64 has a different concept of "shareable". The purpose of this is to determine how they are similar or different and to have a single

I can share with you my understanding of these in aarch64:

If I understood correctly, the "inner" domain refers to L1 & L2 caches while the "outer" domain refers to the L3 cache (although apparently SOC manufacturers sometimes put L2 in outer), and this "shareability" attribute helps the CPU decide which cache domains to update on writes: for an inner shareable mapping, only the inner domain caches will be updated immediately on writes, while for an outer shareable mapping, both cache domains will be updated immediately; it's explained in a very abstract way in B2.7 of DDI0487l.A.
To express cacheability details like write-back/write-through, we use slots in the MAIR register; we must configure the slot for a particular shareability domain. The transient mapping hint indicates that the mapping will not necessarily be used much and it shouldn't replace long-lived cache entries. There are also self-explanatory cache entry allocation hints based on the data access type (r/w).
You can see how I initially tried to understand the differences between the flags of both architectures here on discord.

I don't think these shareability/cacheability attributes map directly to the GLOBAL bit on x86_64.
I only distinguished two types of memory for simplicity, and because I think Theseus doesn't differentiates more than that:

Normal memory, which was inner shareable, cacheable and uses MAIR slot 0
Device memory, which is non-shareable, non-cacheable and uses MAIR slot 1

After having approximatively understood that shareability mechanism, I wonder why aarch64-paging was setting normal memory to be inner shareable. In hindsight, I now set it as outer shareable.

The existing conversion routine from "software" flags to hardware flags is quite slow and overly complex, making it hard to understand what the similarities and differences are across the architectures.

I must say I don't know for sure what's slow and what's fast when dealing with bit flags in Rust; I had assumed the ternary match I used would be the fastest, sorry if I'm wrong. How can I improve this?

There aren't dedicated structs that define the arch-specific PTE flags, making it hard to understand the actual physical layout of PTE flags on each architecture. The main idea was to have 3 structs: one "software" (arch-generic), one specific to aarch64, and one specific to x86_64.

No problem, will do.

Converting from hardware flags to "software" flags isn't supported, but that's a fundamental requirement since we sometimes need to read and interpret the actual flag values from the hardware PTEs. This is also the reason for the above point about needing actual concrete struct definitions for the arch-specific PteFlags

Yes

Please take my descriptive comments from the existing EntryFlags and use that as a model for these. The aarch64 PteFlags should include equally detailed comments

OK

What i had in mind after our chat was a significantly different design (described above); I can either mock it up now or wait for you to take a second crack at it first, but be sure to deeply consider the above points and also how others would use your new types.

I will try to solve the problems you mentioned (but I don't know how to improve the speed of the conversion) and improve the crate.

…ation, conversion routines between arch-specific flags and software ones.

kevinaboos · 2022-11-18T20:58:32Z

Okay great, this is a huge improvement over the first iteration! There's still lots of room for improvement, but now we have a good base from which to start.

I don't know how to improve the speed of the conversion

Well, for most things, it's best to actually profile (benchmark) the routines you're writing to see if they're actually slow. Alternatively, take a look at the generated assembly to see how well the compiler was able to optimize your code.
However, my primary concern is that your conversion routines are just too complex and hard to understand, e.g., the cond() fn is tough to follow (there's already a bitflags-provided function for that), you construct many different PteFlags objects (one per flag) and then bitwise-OR them together, etc.

The main way to make things fast is ... do less work. I know it sounds simple, but here's a concrete example. In your design above, you have multiple conditional branches for each and every bit in PteFlags ... that's a lot of conditionals, which are slow. Instead, one could design the struct to maximize the similarities between the two architectures, i.e., to fully take advantage of the bits that have the same bit position and meaning in both x86_64 and aarch64.

For the sake of learning, here is my thought process when undertaking this design:

I observe that bits [9:11] and [52:62] are free for custom OS usage on x86_64, but on aarch 64 only bits [55:58] are available. Thus, the overlap between them is bits [55:58], so those 4 bits should be our first choice for custom OS flags. Currently, we only need one: EXCLUSIVE, so we move it to bit 55. That way, no work needs to be done during the conversion.
I observe that bit 0 is VALID/PRESENT, so no conversion work needs to be done there, we just need to agree on a name (I like VALID).
It appears that the PAGE_L3_DESCRIPTOR and BLOCK_DESCRIPTOR_Z on aarch64 may be equivalent to HUGE_PAGE on x86_64 (if i understand them correctly -- let's discuss), but are just the inverse of each other. So for that conversion, we can pick either bit position... but see below for a better idea.
In general, we could actually change the entire definition of the "software"-level PteFlags based on the target_arch, which will allow us to do little to no work in the conversion methods.
- For example, the high-level bit position of DIRTY and ACCESSED (and others) could change since they have the same semantics but are in different bit positions. On x86, those would be bits 6 and 5 respectively, but on aarch64 they would be bits 51 and bit 10.
- Similarly, the NO_EXEC const definition would be 1 << 63 on x86, and 1 << 53 | 1 << 54 on aarch64.
- For the GLOBAL and WRITABLE flags, they are both in different bit positions and have inverse meanings on x86_64 and aarch64. So all we need to do is invert and bitshift them.

Another (potential) issue is with how you defined the bitflags constants; that crate doesn't really expect 0-value bit consts to be defined and it's various accessor functions may not work properly with those present... see the bitflags docs for more details and gotchas about how that works.
Likewise, I don't think its a good idea for reserved bits to have a definition, since a user of that code could misinterpet it and "disable" that bit (setting it to 1), which would be invalid.

I am working on a redesign which I will push later this weekend. Then, we can discuss it to ensure our understanding of the various PTE flags on aarch64 is correct.

kevinaboos · 2022-11-19T03:48:11Z

See #699 for my attempt at this design. Hopefully that will serve as a model for future strategies on implementing things for aarch64 and x86_64 in a simple, compatible, and efficient manner.

Feel free to leave comments on that PR; we can also discuss both PRs comparatively on discord or in person.

NathanRoyer · 2022-11-23T13:41:48Z

Can I close this, since #699 replaces it?

kevinaboos · 2022-11-23T22:13:39Z

Closing in favor of #699 now that we have discussed things in depth.

The pte_flags crate replaces entryflags and supports both `aarch6…

309471f

…4` & `x86_64`

kevinaboos changed the title ~~The pte_flags crate replaces entryflags_x86_64 and supports both aarch64 & x86_64~~ Introduce pte_flags crate with support for aarch64 and x86_64 Nov 17, 2022

kevinaboos requested changes Nov 17, 2022

View reviewed changes

NathanRoyer added 3 commits November 18, 2022 15:17

Added architecture-specific flag abstractions, detailed flag document…

02435c8

…ation, conversion routines between arch-specific flags and software ones.

Small documentation additions

9fbfe96

Small documentation additions

0e049c4

kevinaboos mentioned this pull request Nov 19, 2022

Architecture-dependent Page Table Entry (PTE) flags for x86_64 and aarch64 #699

Merged

kevinaboos closed this Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `pte_flags` crate with support for `aarch64` and `x86_64` #696

Introduce `pte_flags` crate with support for `aarch64` and `x86_64` #696

NathanRoyer commented Nov 17, 2022

kevinaboos left a comment

NathanRoyer commented Nov 18, 2022 •

edited

Loading

kevinaboos commented Nov 18, 2022

kevinaboos commented Nov 19, 2022

NathanRoyer commented Nov 23, 2022

kevinaboos commented Nov 23, 2022

Introduce pte_flags crate with support for aarch64 and x86_64 #696

Introduce pte_flags crate with support for aarch64 and x86_64 #696

Conversation

NathanRoyer commented Nov 17, 2022

kevinaboos left a comment

Choose a reason for hiding this comment

NathanRoyer commented Nov 18, 2022 • edited Loading

kevinaboos commented Nov 18, 2022

kevinaboos commented Nov 19, 2022

NathanRoyer commented Nov 23, 2022

kevinaboos commented Nov 23, 2022

Introduce `pte_flags` crate with support for `aarch64` and `x86_64` #696

Introduce `pte_flags` crate with support for `aarch64` and `x86_64` #696

NathanRoyer commented Nov 18, 2022 •

edited

Loading