Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pte_flags crate with support for aarch64 and x86_64 #696

Closed
wants to merge 4 commits into from

Conversation

NathanRoyer
Copy link
Member

This introduces a replacement for entryflags_x86_64 which supports aarch64 and x86_64; support for other architectures can be added easily.

Currently the only documentation is on following methods of MappedPageAttributes:

  • valid
  • writeable
  • executable
  • cacheable
  • global
  • exclusive

That documentation describes the semantics of the corresponding flags. Feel free to suggest other things to document.

@kevinaboos kevinaboos changed the title The pte_flags crate replaces entryflags_x86_64 and supports both aarch64 & x86_64 Introduce pte_flags crate with support for aarch64 and x86_64 Nov 17, 2022
Copy link
Member

@kevinaboos kevinaboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!

There are a few design issues with this:

  • The main point of this redesign is for you (and me jointly) to determine which flags are shared across the two architectures, and also how to express x86-specific terminology/concepts like "global", "cacheable", and "writeback/write-through" flags in aarch64-specific terms (which may also depend on how we intend to use them in Theseus). For example, x86 has a single concept of "global" or not, while aarch64 has a different concept of "shareable". The purpose of this is to determine how they are similar or different and to have a single
  • The existing conversion routine from "software" flags to hardware flags is quite slow and overly complex, making it hard to understand what the similarities and differences are across the architectures.
  • There aren't dedicated structs that define the arch-specific PTE flags, making it hard to understand the actual physical layout of PTE flags on each architecture. The main idea was to have 3 structs: one "software" (arch-generic), one specific to aarch64, and one specific to x86_64. If aarch64 and x86_64 were similar enough, the top-level arch-generic "software" PTE flags could be theoretically removed entirely, though it'd be nice to have that top-level flag struct to make future architectures easier to integrate. Then, the idea would be to implement conversion routines to and from those structs.
    • This is also required in case we need to set a arch-specific flag bit that doesn't have a corresponding representation in the "software" flags or on another arch.
  • Converting from hardware flags to "software" flags isn't supported, but that's a fundamental requirement since we sometimes need to read and interpret the actual flag values from the hardware PTEs. This is also the reason for the above point about needing actual concrete struct definitions for the arch-specific PteFlags
  • Please take my descriptive comments from the existing EntryFlags and use that as a model for these. The aarch64 PteFlags should include equally detailed comments
  • (minor) these flags are used in MappedPages, yes, but they don't strictly have anything to do with that, so the name shouldn't reflect that. PteFlags or PagingFlags is better.

What i had in mind after our chat was a significantly different design (described above); I can either mock it up now or wait for you to take a second crack at it first, but be sure to deeply consider the above points and also how others would use your new types.

@NathanRoyer
Copy link
Member Author

NathanRoyer commented Nov 18, 2022

The main point of this redesign is for you (and me jointly) to determine which flags are shared across the two architectures, and also how to express x86-specific terminology/concepts like "global", "cacheable", and "writeback/write-through" flags in aarch64-specific terms (which may also depend on how we intend to use them in Theseus). For example, x86 has a single concept of "global" or not, while aarch64 has a different concept of "shareable". The purpose of this is to determine how they are similar or different and to have a single

I can share with you my understanding of these in aarch64:

  • If I understood correctly, the "inner" domain refers to L1 & L2 caches while the "outer" domain refers to the L3 cache (although apparently SOC manufacturers sometimes put L2 in outer), and this "shareability" attribute helps the CPU decide which cache domains to update on writes: for an inner shareable mapping, only the inner domain caches will be updated immediately on writes, while for an outer shareable mapping, both cache domains will be updated immediately; it's explained in a very abstract way in B2.7 of DDI0487l.A.
  • To express cacheability details like write-back/write-through, we use slots in the MAIR register; we must configure the slot for a particular shareability domain. The transient mapping hint indicates that the mapping will not necessarily be used much and it shouldn't replace long-lived cache entries. There are also self-explanatory cache entry allocation hints based on the data access type (r/w).
  • You can see how I initially tried to understand the differences between the flags of both architectures here on discord.

I don't think these shareability/cacheability attributes map directly to the GLOBAL bit on x86_64.
I only distinguished two types of memory for simplicity, and because I think Theseus doesn't differentiates more than that:

  • Normal memory, which was inner shareable, cacheable and uses MAIR slot 0
  • Device memory, which is non-shareable, non-cacheable and uses MAIR slot 1

After having approximatively understood that shareability mechanism, I wonder why aarch64-paging was setting normal memory to be inner shareable. In hindsight, I now set it as outer shareable.

The existing conversion routine from "software" flags to hardware flags is quite slow and overly complex, making it hard to understand what the similarities and differences are across the architectures.

I must say I don't know for sure what's slow and what's fast when dealing with bit flags in Rust; I had assumed the ternary match I used would be the fastest, sorry if I'm wrong. How can I improve this?

There aren't dedicated structs that define the arch-specific PTE flags, making it hard to understand the actual physical layout of PTE flags on each architecture. The main idea was to have 3 structs: one "software" (arch-generic), one specific to aarch64, and one specific to x86_64.

No problem, will do.

Converting from hardware flags to "software" flags isn't supported, but that's a fundamental requirement since we sometimes need to read and interpret the actual flag values from the hardware PTEs. This is also the reason for the above point about needing actual concrete struct definitions for the arch-specific PteFlags

Yes

Please take my descriptive comments from the existing EntryFlags and use that as a model for these. The aarch64 PteFlags should include equally detailed comments

OK

What i had in mind after our chat was a significantly different design (described above); I can either mock it up now or wait for you to take a second crack at it first, but be sure to deeply consider the above points and also how others would use your new types.

I will try to solve the problems you mentioned (but I don't know how to improve the speed of the conversion) and improve the crate.

@kevinaboos
Copy link
Member

Okay great, this is a huge improvement over the first iteration! There's still lots of room for improvement, but now we have a good base from which to start.

I don't know how to improve the speed of the conversion

Well, for most things, it's best to actually profile (benchmark) the routines you're writing to see if they're actually slow. Alternatively, take a look at the generated assembly to see how well the compiler was able to optimize your code.
However, my primary concern is that your conversion routines are just too complex and hard to understand, e.g., the cond() fn is tough to follow (there's already a bitflags-provided function for that), you construct many different PteFlags objects (one per flag) and then bitwise-OR them together, etc.

The main way to make things fast is ... do less work. I know it sounds simple, but here's a concrete example. In your design above, you have multiple conditional branches for each and every bit in PteFlags ... that's a lot of conditionals, which are slow. Instead, one could design the struct to maximize the similarities between the two architectures, i.e., to fully take advantage of the bits that have the same bit position and meaning in both x86_64 and aarch64.

For the sake of learning, here is my thought process when undertaking this design:

  • I observe that bits [9:11] and [52:62] are free for custom OS usage on x86_64, but on aarch 64 only bits [55:58] are available. Thus, the overlap between them is bits [55:58], so those 4 bits should be our first choice for custom OS flags. Currently, we only need one: EXCLUSIVE, so we move it to bit 55. That way, no work needs to be done during the conversion.
  • I observe that bit 0 is VALID/PRESENT, so no conversion work needs to be done there, we just need to agree on a name (I like VALID).
  • It appears that the PAGE_L3_DESCRIPTOR and BLOCK_DESCRIPTOR_Z on aarch64 may be equivalent to HUGE_PAGE on x86_64 (if i understand them correctly -- let's discuss), but are just the inverse of each other. So for that conversion, we can pick either bit position... but see below for a better idea.
  • In general, we could actually change the entire definition of the "software"-level PteFlags based on the target_arch, which will allow us to do little to no work in the conversion methods.
    • For example, the high-level bit position of DIRTY and ACCESSED (and others) could change since they have the same semantics but are in different bit positions. On x86, those would be bits 6 and 5 respectively, but on aarch64 they would be bits 51 and bit 10.
    • Similarly, the NO_EXEC const definition would be 1 << 63 on x86, and 1 << 53 | 1 << 54 on aarch64.
    • For the GLOBAL and WRITABLE flags, they are both in different bit positions and have inverse meanings on x86_64 and aarch64. So all we need to do is invert and bitshift them.

Another (potential) issue is with how you defined the bitflags constants; that crate doesn't really expect 0-value bit consts to be defined and it's various accessor functions may not work properly with those present... see the bitflags docs for more details and gotchas about how that works.
Likewise, I don't think its a good idea for reserved bits to have a definition, since a user of that code could misinterpet it and "disable" that bit (setting it to 1), which would be invalid.

I am working on a redesign which I will push later this weekend. Then, we can discuss it to ensure our understanding of the various PTE flags on aarch64 is correct.

@kevinaboos
Copy link
Member

See #699 for my attempt at this design. Hopefully that will serve as a model for future strategies on implementing things for aarch64 and x86_64 in a simple, compatible, and efficient manner.

Feel free to leave comments on that PR; we can also discuss both PRs comparatively on discord or in person.

@NathanRoyer
Copy link
Member Author

Can I close this, since #699 replaces it?

@kevinaboos
Copy link
Member

Closing in favor of #699 now that we have discussed things in depth.

@kevinaboos kevinaboos closed this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants