Vectorizer: xmm register can hold ocaml values #3455

gretay-js · 2025-01-09T13:33:50Z

This PR adds a new variant Valx2 to Cmm.machtype_component to represent a vector register that holds ocaml values and needs to be scanned by the GC, as opposed to the current Vec128 machtype_component that is not scanned.

The first commit mechanically propagates the new variant through the backend.
The interesting part of recording these registers in the frame is in a separate commit. It requires Runtime: make types explicit when reading [gc_regs] #3453 to work correctly.
Finally, a separate commit fixes the vectorizer to correctly set the type of vector registers it creates.

Currently, the vectorizer rejects candidates that would result in "mixed vectors". If we want to allow it, we can replace Valx2 and Vec128 with Vec128 of mask where the mask indicates which parts are values.

For runtime4, xmm register are below [gc_regs], use negative offsets.

gretay-js · 2025-01-09T14:11:25Z

Rebasing it on top of #3453 so the vectorizer test in the CI doesn't segfault.

mshinwell · 2025-01-09T14:49:34Z

Could you confirm an example code sequence where this is necessary, given @stedolan 's comments that it seems like the stores would have to be via caml_modify?

gretay-js · 2025-01-10T15:02:40Z

Could you confirm an example code sequence where this is necessary, given @stedolan 's comments that it seems like the stores would have to be via caml_modify?

Creating a new record, for example:

type t1 = { mutable d0 : int64; mutable d1: int64; }
let copy_fresh (a : t1) : t1 = { d0 = a.d0; d1 = a.d1; }
let copy_mutable (a : t1) (b: t1) : unit = b.d0 <- a.d0;  b.d1 <- a.d1

copy_mutable has caml_modifys and won't be vectorized.
copy_fresh is vectorized and the vector instruction contains ocaml values that may be live across an allocation (of the record itself or an unrelated instruction that may trigger a GC).

TheNumbat

I'm confident this doesn't change behavior for the existing types, but I think scanning vectorized values will need some more testing.

TheNumbat · 2025-01-10T18:29:06Z

backend/amd64/emit.mlp

+          live_offset := encode n :: encode (n + 1) :: !live_offset
+      | {typ = Valx2; loc = Stack s} as reg ->
+          let n = slot_offset s (stack_slot_class reg.typ)  in
+          live_offset := n :: n + 1 :: !live_offset


slot_offset for stack slots is in bytes, so this should be n+8

Good catch!! clearly we need more tests, I'll try to come up with a test.

TheNumbat · 2025-01-10T18:31:08Z

backend/amd64/emit.mlp

@@ -801,15 +812,15 @@ let move (src : Reg.t) (dst : Reg.t) =
  begin match src.typ, src.loc, dst.typ, dst.loc with
  | Float, Reg _, Float, Reg _
  | Float32, Reg _, Float32, Reg _
-  | Vec128, _, Vec128, _ (* Vec128 stack slots are always aligned. *) ->
+  | (Vec128 | Valx2), _, (Vec128 | Valx2), _ (* Vec128 stack slots are always aligned. *) ->


We will never generate Vec128 <-> Valx2 movs, so these should be separate cases

I was unsure about this one, because we don't have separate cases for

| (Int | Val | Addr), _, (Int | Val | Addr), _ ->

The register allocator can generate moves between Int and Val because they are in the same register class, right? This is related to liveness information not being "right" after register allocation, except for registers that are live across instructions that may trigger a GC.

TheNumbat · 2025-01-10T18:33:22Z

backend/amd64/emit.mlp

@@ -2242,6 +2253,7 @@ let emit_probe_handler_wrapper p =
        (match r.typ with
        | Val -> k::acc
        | Int | Float | Vec128 | Float32 -> acc
+        | Valx2 -> k::k+1::acc


I think this is also in bytes

TheNumbat · 2025-01-10T18:45:35Z

backend/select_utils.ml

@@ -212,13 +212,18 @@ let oper_result_type = function
   dependencies, because it uses [Arch]. *)
 let size_component : machtype_component -> int = function
  | Val | Addr -> Arch.size_addr
-  | Int -> Arch.size_int
+  | Int ->
+    assert (Int.equal Arch.size_int Arch.size_addr);


Why add the assert here?

ah, this came from the other copy of the same function, as part of the refactoring #3441. We don't need to assert this every time size_component is called, obviously, but I don't think it matters for performances, as these are statically known and hopefully get eliminated in an optimized build of the compiler.

TheNumbat · 2025-01-10T18:48:02Z

backend/vectorize_utils.ml

+        Printreg.reglist pack;
+    match hd.typ, List.length pack with
+    | (Int | Addr | Float | Float32), _ ->
+      (* allows subregs, width should be correct by construction of [Group]. *)


May as well check the length is 2 (or 4 for float32) since the val case does

TheNumbat · 2025-01-10T18:48:56Z

backend/vectorize_utils.mli

+
+val vectorize_machtypes : Reg.t list -> Cmm.machtype_component
+(* CR-someday gyorsh: [vectorizable_machtypes] should take a [pack] instead of a
+   pair, to handle longer vectors, and to present a uniform interface with


Is taking a list sufficient?

I was referring to generalizing vectorizable_machtypes r1 r2 to vectorizable_machtypes reg_list. Why would it not be enough?

gretay-js added bug Something isn't working vectorizer labels Jan 9, 2025

gretay-js requested review from xclerc and TheNumbat January 9, 2025 13:33

gretay-js mentioned this pull request Jan 9, 2025

Vectorizer: add tests #3456

Draft

gretay-js added 4 commits January 9, 2025 14:09

Runtime: make types explicit when reading [gc_regs].

e6b11ac

Add [Valx2] to [Cmm.machtype_component]

565233c

Vectorizer generates [Valx2]

72a17cf

Record live offsets of [Valx2] in the frametable

d53bf46

For runtime4, xmm register are below [gc_regs], use negative offsets.

gretay-js force-pushed the vectorizer_add_valx2 branch from 663c3e6 to d53bf46 Compare January 9, 2025 14:11

TheNumbat reviewed Jan 10, 2025

View reviewed changes

gretay-js added 3 commits January 13, 2025 10:57

Fix live offset for Valx2, it's in bytes

8538aaf

Check size of register packs

47f867b

Not sure about this one

e10f873

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorizer: xmm register can hold ocaml values #3455

Vectorizer: xmm register can hold ocaml values #3455

gretay-js commented Jan 9, 2025

gretay-js commented Jan 9, 2025

mshinwell commented Jan 9, 2025

gretay-js commented Jan 10, 2025

TheNumbat left a comment

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

TheNumbat Jan 10, 2025

gretay-js Jan 13, 2025

Vectorizer: xmm register can hold ocaml values #3455

Are you sure you want to change the base?

Vectorizer: xmm register can hold ocaml values #3455

Conversation

gretay-js commented Jan 9, 2025

gretay-js commented Jan 9, 2025

mshinwell commented Jan 9, 2025

gretay-js commented Jan 10, 2025

TheNumbat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment