Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorizer: xmm register can hold ocaml values #3455

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gretay-js
Copy link
Contributor

This PR adds a new variant Valx2 to Cmm.machtype_component to represent a vector register that holds ocaml values and needs to be scanned by the GC, as opposed to the current Vec128 machtype_component that is not scanned.

  • The first commit mechanically propagates the new variant through the backend.
  • The interesting part of recording these registers in the frame is in a separate commit. It requires Runtime: make types explicit when reading [gc_regs] #3453 to work correctly.
  • Finally, a separate commit fixes the vectorizer to correctly set the type of vector registers it creates.

Currently, the vectorizer rejects candidates that would result in "mixed vectors". If we want to allow it, we can replace Valx2 and Vec128 with Vec128 of mask where the mask indicates which parts are values.

@gretay-js gretay-js added bug Something isn't working vectorizer labels Jan 9, 2025
@gretay-js gretay-js requested review from xclerc and TheNumbat January 9, 2025 13:33
@gretay-js gretay-js mentioned this pull request Jan 9, 2025
@gretay-js gretay-js force-pushed the vectorizer_add_valx2 branch from 663c3e6 to d53bf46 Compare January 9, 2025 14:11
@gretay-js
Copy link
Contributor Author

Rebasing it on top of #3453 so the vectorizer test in the CI doesn't segfault.

@mshinwell
Copy link
Collaborator

Could you confirm an example code sequence where this is necessary, given @stedolan 's comments that it seems like the stores would have to be via caml_modify?

@gretay-js
Copy link
Contributor Author

Could you confirm an example code sequence where this is necessary, given @stedolan 's comments that it seems like the stores would have to be via caml_modify?

Creating a new record, for example:

type t1 = { mutable d0 : int64; mutable d1: int64; }
let copy_fresh (a : t1) : t1 = { d0 = a.d0; d1 = a.d1; }
let copy_mutable (a : t1) (b: t1) : unit = b.d0 <- a.d0;  b.d1 <- a.d1 
  • copy_mutable has caml_modifys and won't be vectorized.
  • copy_fresh is vectorized and the vector instruction contains ocaml values that may be live across an allocation (of the record itself or an unrelated instruction that may trigger a GC).

Copy link
Contributor

@TheNumbat TheNumbat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confident this doesn't change behavior for the existing types, but I think scanning vectorized values will need some more testing.

live_offset := encode n :: encode (n + 1) :: !live_offset
| {typ = Valx2; loc = Stack s} as reg ->
let n = slot_offset s (stack_slot_class reg.typ) in
live_offset := n :: n + 1 :: !live_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot_offset for stack slots is in bytes, so this should be n+8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!! clearly we need more tests, I'll try to come up with a test.

@@ -801,15 +812,15 @@ let move (src : Reg.t) (dst : Reg.t) =
begin match src.typ, src.loc, dst.typ, dst.loc with
| Float, Reg _, Float, Reg _
| Float32, Reg _, Float32, Reg _
| Vec128, _, Vec128, _ (* Vec128 stack slots are always aligned. *) ->
| (Vec128 | Valx2), _, (Vec128 | Valx2), _ (* Vec128 stack slots are always aligned. *) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will never generate Vec128 <-> Valx2 movs, so these should be separate cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure about this one, because we don't have separate cases for

| (Int | Val | Addr), _, (Int | Val | Addr), _ ->

The register allocator can generate moves between Int and Val because they are in the same register class, right? This is related to liveness information not being "right" after register allocation, except for registers that are live across instructions that may trigger a GC.

@@ -2242,6 +2253,7 @@ let emit_probe_handler_wrapper p =
(match r.typ with
| Val -> k::acc
| Int | Float | Vec128 | Float32 -> acc
| Valx2 -> k::k+1::acc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also in bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -212,13 +212,18 @@ let oper_result_type = function
dependencies, because it uses [Arch]. *)
let size_component : machtype_component -> int = function
| Val | Addr -> Arch.size_addr
| Int -> Arch.size_int
| Int ->
assert (Int.equal Arch.size_int Arch.size_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the assert here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this came from the other copy of the same function, as part of the refactoring #3441. We don't need to assert this every time size_component is called, obviously, but I don't think it matters for performances, as these are statically known and hopefully get eliminated in an optimized build of the compiler.

Printreg.reglist pack;
match hd.typ, List.length pack with
| (Int | Addr | Float | Float32), _ ->
(* allows subregs, width should be correct by construction of [Group]. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well check the length is 2 (or 4 for float32) since the val case does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


val vectorize_machtypes : Reg.t list -> Cmm.machtype_component
(* CR-someday gyorsh: [vectorizable_machtypes] should take a [pack] instead of a
pair, to handle longer vectors, and to present a uniform interface with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is taking a list sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to generalizing vectorizable_machtypes r1 r2 to vectorizable_machtypes reg_list. Why would it not be enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vectorizer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants