Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misaligned load/store on page crossing doesn't tablewalk for second page #49

Open
scottj97 opened this issue May 8, 2020 · 31 comments
Open
Labels
bug Something isn't working

Comments

@scottj97
Copy link
Contributor

scottj97 commented May 8, 2020

An 8-byte store to address 0xffc, for example, should store 4 bytes to one page and 4 to the next page. When paging is enabled, two separate translations are required.

The Sail model is only translating the first page, then storing 8 bytes to contiguous physical addresses.

  • No tablewalk occurs for the second page
  • No exceptions are taken if the second page has unacceptable permissions
  • The physical address for the second half of the store is incorrect

I created a repo with a test to recreate the issue. Instruction 128 shows the problem. There is one tablewalk, then 8 bytes loaded from 0x80003FFC.

This should take an exception (with mtval 0x0000003a177df000) because the second page's access bit is 0. The handler will repair the access bit, then re-execute the instruction. The second attempt should succeed, loading the second half of the access from a different physical page.

(Fetches across pages seem to work correctly.)

@pmundkur
Copy link
Contributor

this is a great catch and test case. i'm pondering the cleanest way to fix this.

@rmn30
Copy link
Collaborator

rmn30 commented Jun 28, 2021

Unfortunately looks like we still need a fix for this. How about adding a width parameter to translateAddr and adding a TR_Straddling case to the TR_Result union? Will need to handle this in quite a lot of places.

@PeterSewell
Copy link
Collaborator

PeterSewell commented Jun 28, 2021 via email

@rmn30
Copy link
Collaborator

rmn30 commented Jun 28, 2021

Yes, I'm proposing that translateAddr would do two translations if necessary and then the caller would have to deal with an additional case for page-straddling by performing multiple accesses and merging the result. Possibly the single-page case would go away and become a degenerate case.

@PeterSewell
Copy link
Collaborator

PeterSewell commented Jun 28, 2021 via email

@rmn30
Copy link
Collaborator

rmn30 commented Jun 28, 2021

I haven't checked the spec but I imagine you'd want to do both of the translations first to check for faults before doing the accesses otherwise you could end up doing half a write then realising you have to take a fault. The sail model does the translation first then the memory access uses the resulting physical address (unless there is a fault) e.g. https://github.com/rems-project/sail-riscv/blob/master/model/riscv_insts_base.sail#L326 The higher up place you posit doesn't currently exist although it might be useful to introduce a translate-and-handle-misalignment layer.

@PeterSewell
Copy link
Collaborator

PeterSewell commented Jun 28, 2021 via email

@rmn30
Copy link
Collaborator

rmn30 commented Jun 28, 2021

I can't see anywhere in the RISC-V specs. that directly addresses this case although it is permitted to emulate unaligned accesses in software implying that partial writes are possible at least in that case.

@scottj97
Copy link
Contributor Author

Partial writes are allowed by the spec. Spike does partial writes in this case.

However, some implementations will not, and I'd like Sail to be able to model that implementation choice without requiring extensive rework of this logic.

@allenjbaum
Copy link
Collaborator

This is a bug that will interfere with a fair number of tests - has there been any progress?
Does that mean that we need to configure not only whether Sail supports unaligned accesses, but that if it does, it checks both halves before writing anything (or reading)?
E.g. translate, write, translate, write vs Translate, Translate, Write, Write to ensure we can configure it like a real implementation?

@scottj97
Copy link
Contributor Author

scottj97 commented Oct 1, 2021

Does that mean that we need to configure not only whether Sail supports unaligned accesses, but that if it does, it checks both halves before writing anything (or reading)? E.g. translate, write, translate, write vs Translate, Translate, Write, Write to ensure we can configure it like a real implementation?

I'm not opposed to making source code edits to change this behavior if required, but I hope it can be a small change and not a complete rewrite of the logic.

There are many implementation-dependent things that require source code edits today.

@allenjbaum
Copy link
Collaborator

I'm not sure what you mean by source code edits - as opposed to what?

My concern here is that we are comparing the behavior of a Sail model with that of some vendor's legal RISC-V implementation. If more than one result is legally possible, I want to be have the vendor indicate which behavior they implement (and I'm going to make the assumption in this case it is one order or the other), and be able to configure sail to mimic that behavior.

It's a little worse in this case, because we already know that
"doesn't support unaligned data access" means an implementation will always trap, while
"does support unaligned data access" means that sometimes it will just perform the access,
and sometimes it will trap (based on non-architectural state values such as TLB state).

So results can be deterministic (that is, based on known architectural state or architectural options that are communicated via a YAML configuration file), and coild be non-deterministic (non-architectural state dependent).
If the number of legal non-deterministic results is small (e.g. 2, possibly 3), we intend to configure Sail to use one behavior, and then rerun with the other behavior, and allow matches on either behavior to be considered a match.

If I understand this case correctly, there are 3 legal results:

  • return or write the value,
  • access nothing and trap,
  • access something and trap,
    I think can be reduced to 2 sets of two reaulta, which the pair is selected via a YAML configuration option.
    But, this requires that all 3 behaviors can be configured in the Sail model.

It's not the only option that doesn't have a CSR bit or some other means of determining what it does, but this particular example is the most challenging - and one of the more important ones to implement.

@scottj97
Copy link
Contributor Author

scottj97 commented Oct 2, 2021

Editing the source *.sail files, as opposed to editing a config setting in a YAML file. Since no such config system exists today, I've made dozens of source edits to match my team's implementation choices. If you intend to support all legal implementation choices via the YAML then yes you'd need a setting for this.

In this particular test case, we're doing an 8-byte store across two pages, the second of which does not have its PTE's dirty bit set. There are three legal results, depending on implementation choices. There is no nondeterministic behavior here.

  • write the second PTE's dirty bit to 1, write 4 bytes to the first page, write 4 bytes to the second page (not necessarily in that order)
  • write 4 bytes to the first page, then trap because the second page's dirty bit is 0
  • write nothing; trap because the second page's dirty bit is 0

Regarding nondeterminism, I'll send you email about that, since it's getting off-topic for this issue report.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Oct 2, 2021 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 11, 2024

I am going to fix this.

For memory reads I think it isn't too bad to solve. I will just change the code to a single mem_read() that takes a virtual address, and does virtual address translation and the memory read. For reads I can't see any reason to split up address translation and physical memory access.

Putting everything in one function means it is very easy to handle split accesses, and also means the virtual address is readily accessible for logging which is something we require for our verif system.

For writes, that doesn't quite work because of the requirement for the write value to be calculated after mem_write_ea() but before mem_write_value(). In most languages you could pass a get_value() callback that would calculate the write value, but Sail doesn't support first class functions so that isn't an option.

My proposal is to do the same as read - have a single mem_write() that takes a virtual address. However split it into two functions with an opaque type passed between them:

let opaque_data = mem_write_prepare(vaddr, width, etc.);
let write_value = X(rs1);
mem_write_commit(opaque_data, write_value);

I think this will work and be fairly elegant. It also allows you to fairly easily modify the code that it does partial writes. I will implement the case that does both translations up front though, since that's what the chip I'm working on does and I think it is the simplest & most likely case in most implementations.

If anyone thinks this is a bad idea let me know!

@allenjbaum
Copy link
Collaborator

allenjbaum commented Mar 12, 2024 via email

@PeterSewell
Copy link
Collaborator

PeterSewell commented Mar 12, 2024 via email

@allenjbaum
Copy link
Collaborator

allenjbaum commented Mar 12, 2024 via email

@PeterSewell
Copy link
Collaborator

PeterSewell commented Mar 12, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 12, 2024

Note that the spec doesn't require that the write to the 2 halves appear in a particular order, nor that an exception caused by one or the other, prevents the write of the other half...

Yep, and consolidating all of this behaviour into one read and two write functions will make it much easier to support all of those behaviours, and make them configurable eventually (though I don't plan to do that at this point).

a local PAR construct in Sail

Interesting, so you'll do something like:

let paddr0, paddr1 = par(
   translateAddr(...),
   translateAddr(...),
);

and it will do them sequentially in the C/OCaml versions, but in "any order" in the formal version?

@PeterSewell
Copy link
Collaborator

PeterSewell commented Mar 12, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 25, 2024

One other question @PeterSewell - why does mem_write_ea() do an alignment check? I still don't really understand the exact conditions under which it is expected to be called.

Btw - an extra complication with this that we discovered - if your two physical memory accesses happen to be in two different PMP regions then it must cause an access fault. It unfortunately means you can't treat the split access as two independent normal accesses.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Mar 25, 2024 via email

@rsnikhil
Copy link
Collaborator

For writes, that doesn't quite work because of the requirement for the write value to be calculated after mem_write_ea() but before mem_write_value().

Why? Can’t the write-value calculation (just a register read) be done concurrently, or even before ea calculation?

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 25, 2024

Why can't you treat it as two independent accesses?

Probably isn't clear without seeing my code but the PMP checks were being done within the functions that do the access. It's effectively like this (pseudocode):

function vmem_read(vaddr : xlenbits, ...) = {
   ...
   let (paddr0, paddr1) = translate(vaddr);
   let part0 = mem_read(paddr0, width0);
   let part1 = mem_read(paddr1, width1);
   part0 @ part1
}

function mem_read(paddr : xlenbits, width : ...) = {
   let pmp_exc = pmp_check(paddr, width);
   ...
}

That's wrong because the PMP check can pass for each part separately, but if they happen to be accesses to different PMP regions then the vmem_read() should fail with an access fault. Even if the PMP checks for each part are ok.

Why? Can’t the write-value calculation (just a register read) be done concurrently, or even before ea calculation?

For the sequential model it makes no difference at all since mem_write_ea() doesn't actually do anything. It's some requirement for the concurrency model which I don't fully understand (see this discussion). I think it would be really worthwhile to get some detailed documentation around that because otherwise it's going to be constantly broken by people editing the code who are only using the emulator, which I think is most of us.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Mar 25, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 25, 2024

I think we're talking about different things. Implementations are allowed to do the actual memory writes (after PMP checks) separately, but they can't do independent PMP checks for each part as if they were two separate store instructions.

Here's the relevant bit of the spec:

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails, irrespective of the L, R, W, and X bits.

Not unsolvable but I'm going to have to do even more refactoring... :-/

@scottj97
Copy link
Contributor Author

I think we're talking about different things. Implementations are allowed to do the actual memory writes (after PMP checks) separately, but they can't do independent PMP checks for each part as if they were two separate store instructions.

Here's the relevant bit of the spec:

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails, irrespective of the L, R, W, and X bits.

Not unsolvable but I'm going to have to do even more refactoring... :-/

Allen is correct. I forget where I learned this but it’s acceptable in RISC-V for an implementation to split up a single load/store into multiple accesses.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Mar 25, 2024 via email

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 26, 2024

Oh yeah you're right... I should have read slightly further!

Note that a single instruction may generate multiple accesses, which may not be mutually atomic. An access-fault
exception is generated if at least one access generated by an instruction fails, though other accesses generated by that instruction may succeed with visible side effects. Notably, instructions that reference virtual memory are decomposed into multiple accesses

That's a relief!

@Timmmm Timmmm added the bug Something isn't working label May 7, 2024
@Alasdair
Copy link
Collaborator

I had a go at refactoring the LOAD instruction:

function clause execute(LOAD(imm, rs1, rd, is_unsigned, width, aq, rl)) = {
  let offset : xlenbits = sign_extend(imm);
  /* Get the address, X(rs1) + offset.
     Some extensions perform additional checks on address validity. */
  match ext_data_get_addr(rs1, offset, Read(Data), width) {
    Ext_DataAddr_Error(e)  => {
      ext_handle_data_check_error(e);
      RETIRE_FAIL
    },

    Ext_DataAddr_OK(vaddr) => {
      if check_misaligned(vaddr, width) then {
         handle_mem_exception(vaddr, E_Load_Addr_Align());
         return RETIRE_FAIL
      };

      let bytes = width_bytes(width);
      if bytes > sizeof(xlen_bytes) then {
         report_invalid_width(__FILE__, __LINE__, width, "load")
      } else {
         let (n, bytes) = misalignment(vaddr, bytes);
         var data = zeros(8 * n * bytes);

         foreach (i from 0 to (n - 1)) {
           let vaddr = vaddr + (i * bytes);
           match translateAddr(vaddr, Read(Data)) {
             TR_Failure(e, _) => {
               handle_mem_exception(vaddr, e);
               return RETIRE_FAIL
             },

             TR_Address(paddr, _) => match mem_read(Read(Data), paddr, bytes, aq, rl, false) {
               MemValue(v) => {
                 data[(8 * (i + 1) * bytes) - 1 .. 8 * i * bytes] = v
               },
               MemException(e) => {
                 handle_mem_exception(vaddr, e);
                 return RETIRE_FAIL
               }
             }
           }
         };

         X(rd) = if is_unsigned then zero_extend(data) else sign_extend(data);

         RETIRE_SUCCESS
      }
    }
  }
}

the line:

let (n, X) = misalignment(vaddr, Y);

splits a X bytes load into n times Y bytes load (if vaddr is aligned, n is always 1). This behaviour could be configurable for implementations that always want to split misaligned accesses into byte accesses, but I just made it split into the largest aligned width.

For the linked test the offending instruction now gives:

[128] [S]: 0x0000000080002096 (0x0002B303) ld t1, 0x0(t0)
mem[R,0x0000000081000740] -> 0x0000000020400C01
mem[R,0x00000000810035D8] -> 0x0000000020401001
mem[R,0x0000000081004EF0] -> 0x0000000020000CCF
mem[R,0x0000000080003FFC] -> 0x80670052
mem[R,0x0000000081000740] -> 0x0000000020400C01
mem[R,0x00000000810035D8] -> 0x0000000020401001
mem[R,0x0000000081004EF8] -> 0x000000002000140F
trapping from S to M to handle load-page-fault
handling exc#0x0D at priv M with tval 0x0000003A177DF000
CSR mstatus <- 0x0000000A00000800
mem[X,0x0000000080002104] -> 0x1173
mem[X,0x0000000080002106] -> 0x3401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants