Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Allow generated functions to return a CodeInstance #56650

Closed
wants to merge 1 commit into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Nov 22, 2024

This PR allows generated functions to return a CodeInstance containing optimized IR, allowing them to bypass inference and directly adding inferred code into the ordinary course of execution. This is an enabling capability for various external compiler implementations that may want to provide compilation results to the Julia runtime.

As a minimal demonstrator of this capability, this adds a Cassette-like with_new_compiler higher-order function, which will compile/execute its arguments with the currently loaded Compiler package. Unlike @activate Compiler[:codegen], this change is not global and the cache is fully partitioned. This by itself is a very useful feature when developing Compiler code to be able to test the full end-to-end codegen behavior before the changes are capable of fully self-hosting.

A key enabler for this was the recent merging of #54899. This PR includes a hacky version of the second TODO left at the end of that PR, just to make everthing work end-to-end.

This PR is working end-to-end, but all three parts of it (the CodeInstance return from generated functions, the with_new_compiler feature, and the interpreter integration) need some additional cleanup. This PR is mostly intended as a discussion point for what that additional work needs to be.

This PR allows generated functions to return a `CodeInstance` containing
optimized IR, allowing them to bypass inference and directly adding
inferred code into the ordinary course of execution. This is an enabling
capability for various external compiler implementations that may want
to provide compilation results to the Julia runtime.

As a minimal demonstrator of this capability, this adds a
Cassette-like `with_new_compiler` higher-order function, which
will compile/execute its arguments with the currently loaded `Compiler`
package. Unlike `@activate Compiler[:codegen]`, this change is not
global and the cache is fully partitioned. This by itself is a very
useful feature when developing Compiler code to be able to test
the full end-to-end codegen behavior before the changes are capable
of fully self-hosting.

A key enabler for this was the recent merging of #54899. This PR
includes a hacky version of the second TODO left at the end of
that PR, just to make everthing work end-to-end.

This PR is working end-to-end, but all three parts of it (the CodeInstance
return from generated functions, the `with_new_compiler` feature,
and the interpreter integration) need some additional cleanup. This
PR is mostly intended as a discussion point for what that additional
work needs to be.
@topolarity
Copy link
Member

This by itself is a very useful feature when developing Compiler code to be able to test the full end-to-end codegen behavior before the changes are capable of fully self-hosting.

This sounds awesome - Does that include LLVM codegen, or just Julia IR?

@vchuravy
Copy link
Member

The ability to provide a CodeInstance out of a generated function is cool, but would not be sufficient for something like Enzyme.

In #52964 I provided an intrinsic invoke_within that performed the "Compiler plugin" switching.
I added support for abstract interpretation so that such calls would be well-inferred,
but I intentionally did not allow for inlining, since I was envisioning compiler plugins with some LLVM integration eventually. For compiler plugins that do not require that, one could enable inlining naturally without having to rely on a generated function to execute the compiler therein.

Of course, there are two different "interfaces" here, my notion of a compiler plugin was based on the abstract interpreter interface and less on the ability to load a second copy of the compiler.

I think what is dissatisfying for me with this approach is that we can't execute CodeInstances from a different owner, and instead have to transform it to owner->nothing.

The key idea in #52964 is to not have to use a Cassette like transform for propagation of the compiler and instead handle generic function calls and tasks consistently. I can rebase #52964, but I without any feedback, I didn't want to spend more energy down a path that has little chance of being adopted.

@Keno
Copy link
Member Author

Keno commented Nov 22, 2024

This sounds awesome - Does that include LLVM codegen, or just Julia IR?

The idea is that you can return a code instance either with inferred Julia IR set, in which case the runtime will compile it for you or with the full set of invoke pointers set, in which case it'll just become active immediately. The latter case is for people who are completely writing their own compilers in Julia and just need the entry point. That said, there's additional semantics that need to be made to work in both cases, and while I would like to support the second case, I'm not planning to actually put together anything end-to-end there for the time being.

@Keno
Copy link
Member Author

Keno commented Nov 22, 2024

The ability to provide a CodeInstance out of a generated function is cool, but would not be sufficient for something like Enzyme.

Why not?

I think what is dissatisfying for me with this approach is that we can't execute CodeInstances from a different owner, and instead have to transform it to owner->nothing.

We can, see the implementation.

#52964

The key advantage of #52964 over this approach is that it does not force the existence of a concrete signature for the entry dispatcher and that it fully participates in the ordinary compiler cycle detection. However, there's a disadvantage as well, in that the interface is much broader, because compiler data structures become part of the ABI. There's still value to something like #52964 - it's just harder to know what it should look like, since it's a more complicated interface. Extending generated functions is quite natural, since we already know the semantics. That said, I think getting this PR fully working would actually make implementing #52964 properly easier, since it could then be implemented an an optimization over the semantics from this PR. To be concrete, rather than make the invoke_within from #52964 a builtin, make it a compiler generic generated function like return_type, whose runtime behavior uses the mechanism from this PR, but which gets recognized at compile time (if it's the same copy of Compiler) to have the wider ABI from #52964.

@Keno
Copy link
Member Author

Keno commented Nov 22, 2024

(To preempt the complaint that with_new_compiler doesn't recurse - that's easily fixed by a cache transformation pass the prepends the context. It's part of the two or three more changes to this code that still need to be done, along with the recursion protection).

@vchuravy
Copy link
Member

that's easily fixed by a cache transformation pass the prepends the context.

Yeah, I don't like that additional work xD. I think that's what leaves me a bit unsatisfied, as greedy as I am.

We already have a tagged CodeInstance in the System, (tagged on SplitCache here), which we untag by copying it into a new CodeInstance and then additionally we need to run a (linear) cache transformation to modify the IR.

The ability to provide a CodeInstance out of a generated function is cool, but would not be sufficient for something like Enzyme.

Why not?

Once the code hits the compiler, we lose track of the source and everything get's treated as one.
I was thinking about two things, first a custom "codegen" extension to support experiments like my old Tapir work,
and secondly an interface for LLVM passes that is scoped to a compiler plugin, but for that one would need to maintain
provenance of where the CodeInstance came from.

Of course, we could go with unconditional LLVM plugins and pseudo-intrinsics like the Clang plugin interface.

Keno added a commit that referenced this pull request Nov 23, 2024
This is an alternative mechanism to #56650 that largely achieves
the same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid
of mechanism in #52964 and what I proposed in #56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
   `types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this optimized
   generic allow the compiler to instead call a companion `typeinf_edge`
   function, allowing a mid-inference interpreter switch (like #52964),
   without being forced through a concrete signature bottleneck. However,
   if calling `typeinf_edge` does not work (e.g. because the compiler
   version is mismatched), this still has well defined semantics, you
   just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it lets
custom cache owners tell the runtime how to "cure" code instances that
have lost their native code. Currently the runtime only knows how to
do that for `owner == nothing` CodeInstances (by re-running inference).
This extension is not implemented, but the idea is that the runtime would
be permitted to call the `typeinf` optimized generic on the dead
CodeInstance's `owner` and `def` fields to obtain a cured CodeInstance (or
a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.
This PR includes just enough compiler support to make the compiler
optimize this to the same code that #56650 produced:

```
julia> @code_typed with_new_compiler(sin, 1.0)
CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_get_tls_world_age), UInt64, svec(), 0, :(:ccall)))::UInt64
│   %2 =   builtin Core.getfield(args, 1)::Float64
│   %3 =    invoke sin(%2::Float64)::Float64
└──      return %3
) => Float64
```

However, the implementation here is extremely incomplete. I'm putting
it up only as a directional sketch to see if people prefer it over #56650.
If so, I would prepare a cleaned up version of this PR that has the
optimized generics as well as the curing support, but not the full
inference integration (which needs a fair bit more work).
@MasonProtter
Copy link
Contributor

Something I'm a little confused about, with this PR would the ban on querying type inference form within a generated function body be lifted?

@Keno
Copy link
Member Author

Keno commented Nov 24, 2024

This PR does not technically call inference within a generated function - it calls a second copy of the same code, but regardless. A lot of the issues with generated functions and calling into inference have been solved over the years by letting generated functions getting passed the world age as an argument and by allowing them to declare world bounds and edges. The primary problem remaining with recursing into inference in generated functions is that you could end up in a loop. For this PR, because the with_new_compiler implementation has only one entry point, it could recognize that situation in the custom interpreter. #56660 has a more general recursion handling mechanism (as long as both interpreters use the same Compiler copy).

@Keno
Copy link
Member Author

Keno commented Nov 26, 2024

Closing in favor of #56660

@Keno Keno closed this Nov 26, 2024
Keno added a commit that referenced this pull request Nov 29, 2024
This is an alternative mechanism to #56650 that largely achieves
the same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid
of mechanism in #52964 and what I proposed in #56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
   `types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this optimized
   generic allow the compiler to instead call a companion `typeinf_edge`
   function, allowing a mid-inference interpreter switch (like #52964),
   without being forced through a concrete signature bottleneck. However,
   if calling `typeinf_edge` does not work (e.g. because the compiler
   version is mismatched), this still has well defined semantics, you
   just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it lets
custom cache owners tell the runtime how to "cure" code instances that
have lost their native code. Currently the runtime only knows how to
do that for `owner == nothing` CodeInstances (by re-running inference).
This extension is not implemented, but the idea is that the runtime would
be permitted to call the `typeinf` optimized generic on the dead
CodeInstance's `owner` and `def` fields to obtain a cured CodeInstance (or
a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.

That said, this PR does not yet include the compiler optimization that
implements the semantics of the optimized generic, which will be in a
follow up PR.
Keno added a commit that referenced this pull request Nov 29, 2024
This is an alternative mechanism to #56650 that largely achieves
the same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid
of mechanism in #52964 and what I proposed in #56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
   `types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this optimized
   generic allow the compiler to instead call a companion `typeinf_edge`
   function, allowing a mid-inference interpreter switch (like #52964),
   without being forced through a concrete signature bottleneck. However,
   if calling `typeinf_edge` does not work (e.g. because the compiler
   version is mismatched), this still has well defined semantics, you
   just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it lets
custom cache owners tell the runtime how to "cure" code instances that
have lost their native code. Currently the runtime only knows how to
do that for `owner == nothing` CodeInstances (by re-running inference).
This extension is not implemented, but the idea is that the runtime would
be permitted to call the `typeinf` optimized generic on the dead
CodeInstance's `owner` and `def` fields to obtain a cured CodeInstance (or
a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.

That said, this PR does not yet include the compiler optimization that
implements the semantics of the optimized generic, which will be in a
follow up PR.
Keno added a commit that referenced this pull request Nov 29, 2024
This is an alternative mechanism to #56650 that largely achieves
the same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid
of mechanism in #52964 and what I proposed in #56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
   `types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this optimized
   generic allow the compiler to instead call a companion `typeinf_edge`
   function, allowing a mid-inference interpreter switch (like #52964),
   without being forced through a concrete signature bottleneck. However,
   if calling `typeinf_edge` does not work (e.g. because the compiler
   version is mismatched), this still has well defined semantics, you
   just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it lets
custom cache owners tell the runtime how to "cure" code instances that
have lost their native code. Currently the runtime only knows how to
do that for `owner == nothing` CodeInstances (by re-running inference).
This extension is not implemented, but the idea is that the runtime would
be permitted to call the `typeinf` optimized generic on the dead
CodeInstance's `owner` and `def` fields to obtain a cured CodeInstance (or
a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.

That said, this PR does not yet include the compiler optimization that
implements the semantics of the optimized generic, which will be in a
follow up PR.
Keno added a commit that referenced this pull request Dec 3, 2024
This is an alternative mechanism to #56650 that largely achieves the
same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid of
mechanism in #52964 and what I proposed in
#56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
`types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this
optimized generic allow the compiler to instead call a companion
`typeinf_edge` function, allowing a mid-inference interpreter switch
(like #52964), without being forced through a concrete signature
bottleneck. However, if calling `typeinf_edge` does not work (e.g.
because the compiler version is mismatched), this still has well defined
semantics, you just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it
lets custom cache owners tell the runtime how to "cure" code instances
that have lost their native code. Currently the runtime only knows how
to do that for `owner == nothing` CodeInstances (by re-running
inference). This extension is not implemented, but the idea is that the
runtime would be permitted to call the `typeinf` optimized generic on
the dead CodeInstance's `owner` and `def` fields to obtain a cured
CodeInstance (or a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.
This PR includes just enough compiler support to make the compiler
optimize this to the same code that #56650 produced:

```
julia> @code_typed with_new_compiler(sin, 1.0)
CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_get_tls_world_age), UInt64, svec(), 0, :(:ccall)))::UInt64
│   %2 =   builtin Core.getfield(args, 1)::Float64
│   %3 =    invoke sin(%2::Float64)::Float64
└──      return %3
) => Float64
```

However, the implementation here is extremely incomplete. I'm putting it
up only as a directional sketch to see if people prefer it over #56650.
If so, I would prepare a cleaned up version of this PR that has the
optimized generics as well as the curing support, but not the full
inference integration (which needs a fair bit more work).
stevengj pushed a commit that referenced this pull request Jan 2, 2025
This is an alternative mechanism to #56650 that largely achieves the
same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in #56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid of
mechanism in #52964 and what I proposed in
#56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
`types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this
optimized generic allow the compiler to instead call a companion
`typeinf_edge` function, allowing a mid-inference interpreter switch
(like #52964), without being forced through a concrete signature
bottleneck. However, if calling `typeinf_edge` does not work (e.g.
because the compiler version is mismatched), this still has well defined
semantics, you just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it
lets custom cache owners tell the runtime how to "cure" code instances
that have lost their native code. Currently the runtime only knows how
to do that for `owner == nothing` CodeInstances (by re-running
inference). This extension is not implemented, but the idea is that the
runtime would be permitted to call the `typeinf` optimized generic on
the dead CodeInstance's `owner` and `def` fields to obtain a cured
CodeInstance (or a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from #56650.
This PR includes just enough compiler support to make the compiler
optimize this to the same code that #56650 produced:

```
julia> @code_typed with_new_compiler(sin, 1.0)
CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_get_tls_world_age), UInt64, svec(), 0, :(:ccall)))::UInt64
│   %2 =   builtin Core.getfield(args, 1)::Float64
│   %3 =    invoke sin(%2::Float64)::Float64
└──      return %3
) => Float64
```

However, the implementation here is extremely incomplete. I'm putting it
up only as a directional sketch to see if people prefer it over #56650.
If so, I would prepare a cleaned up version of this PR that has the
optimized generics as well as the curing support, but not the full
inference integration (which needs a fair bit more work).
@DilumAluthge DilumAluthge deleted the kf/minidaecompiler branch January 12, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants