-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorizer: xmm register can hold ocaml values #3455
base: main
Are you sure you want to change the base?
Conversation
For runtime4, xmm register are below [gc_regs], use negative offsets.
663c3e6
to
d53bf46
Compare
Rebasing it on top of #3453 so the vectorizer test in the CI doesn't segfault. |
Could you confirm an example code sequence where this is necessary, given @stedolan 's comments that it seems like the stores would have to be via |
Creating a new record, for example:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confident this doesn't change behavior for the existing types, but I think scanning vectorized values will need some more testing.
backend/amd64/emit.mlp
Outdated
live_offset := encode n :: encode (n + 1) :: !live_offset | ||
| {typ = Valx2; loc = Stack s} as reg -> | ||
let n = slot_offset s (stack_slot_class reg.typ) in | ||
live_offset := n :: n + 1 :: !live_offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slot_offset
for stack slots is in bytes, so this should be n+8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!! clearly we need more tests, I'll try to come up with a test.
backend/amd64/emit.mlp
Outdated
@@ -801,15 +812,15 @@ let move (src : Reg.t) (dst : Reg.t) = | |||
begin match src.typ, src.loc, dst.typ, dst.loc with | |||
| Float, Reg _, Float, Reg _ | |||
| Float32, Reg _, Float32, Reg _ | |||
| Vec128, _, Vec128, _ (* Vec128 stack slots are always aligned. *) -> | |||
| (Vec128 | Valx2), _, (Vec128 | Valx2), _ (* Vec128 stack slots are always aligned. *) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will never generate Vec128 <-> Valx2
movs, so these should be separate cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure about this one, because we don't have separate cases for
| (Int | Val | Addr), _, (Int | Val | Addr), _ ->
The register allocator can generate moves between Int
and Val
because they are in the same register class, right? This is related to liveness information not being "right" after register allocation, except for registers that are live across instructions that may trigger a GC.
backend/amd64/emit.mlp
Outdated
@@ -2242,6 +2253,7 @@ let emit_probe_handler_wrapper p = | |||
(match r.typ with | |||
| Val -> k::acc | |||
| Int | Float | Vec128 | Float32 -> acc | |||
| Valx2 -> k::k+1::acc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also in bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -212,13 +212,18 @@ let oper_result_type = function | |||
dependencies, because it uses [Arch]. *) | |||
let size_component : machtype_component -> int = function | |||
| Val | Addr -> Arch.size_addr | |||
| Int -> Arch.size_int | |||
| Int -> | |||
assert (Int.equal Arch.size_int Arch.size_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the assert here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, this came from the other copy of the same function, as part of the refactoring #3441. We don't need to assert this every time size_component
is called, obviously, but I don't think it matters for performances, as these are statically known and hopefully get eliminated in an optimized build of the compiler.
Printreg.reglist pack; | ||
match hd.typ, List.length pack with | ||
| (Int | Addr | Float | Float32), _ -> | ||
(* allows subregs, width should be correct by construction of [Group]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well check the length is 2 (or 4 for float32) since the val case does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
val vectorize_machtypes : Reg.t list -> Cmm.machtype_component | ||
(* CR-someday gyorsh: [vectorizable_machtypes] should take a [pack] instead of a | ||
pair, to handle longer vectors, and to present a uniform interface with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is taking a list sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to generalizing vectorizable_machtypes r1 r2
to vectorizable_machtypes reg_list
. Why would it not be enough?
This PR adds a new variant
Valx2
toCmm.machtype_component
to represent a vector register that holds ocaml values and needs to be scanned by the GC, as opposed to the currentVec128
machtype_component that is not scanned.Currently, the vectorizer rejects candidates that would result in "mixed vectors". If we want to allow it, we can replace
Valx2
andVec128
withVec128 of mask
where the mask indicates which parts are values.