Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-58575] SubstrateVM PLT/GOT Feature #9883

Merged
merged 1 commit into from
Oct 25, 2024
Merged

[GR-58575] SubstrateVM PLT/GOT Feature #9883

merged 1 commit into from
Oct 25, 2024

Conversation

graalvmbot
Copy link
Collaborator

Introduces an additional level of indirection for calls where a GOT (Global Offset Table) is an array of method pointers and PLT (Procedure Linkage Table) is a collection of small stubs. With this feature enabled, direct calls are emitted as indirect calls through the GOT. The virtual table is filled with PLT stubs instead.

While inspired by ELF, no ELF mechanisms are used in the implementation.

Example usecase: Hijack code execution on call boundaries to diverge execution from AOT code to an interpreter.

Contributors:

  • Aleksandar Gradinac: Initial implementation on linux-amd64.
  • Marko Spasic: Miscellaneous improvements.
  • Bernhard Urban-Forster: Support for linux-aarch64, darwin-aarch64 and darwin-amd64.
  • Alfonso² Peterssen: Support for windows-amd64.

Co-authored-by: Aleksandar Gradinac [email protected]
Co-authored-by: Marko Spasic [email protected]
Co-authored-by: Alfonso² Peterssen [email protected]

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Oct 15, 2024
@zakkak
Copy link
Collaborator

zakkak commented Oct 18, 2024

Hello, I have some questions regarding this feature.

  1. How does this added indirection impact the performance of direct calls? Do you have any evaluation results you could share?
  2. Does it apply unconditionally to all calls?

FYI @galderz @franz1981

Introduces an additional level of indirection for calls where a GOT (Global Offset Table) is an array of method pointers and PLT (Procedure Linkage Table) is a collection of small stubs.  With this feature enabled, direct calls are emitted as indirect calls through the GOT. The virtual table is filled with PLT stubs instead.

While inspired by ELF, no ELF mechanisms are used in the implementation.

Example usecase: Hijack code execution on call boundaries to diverge execution from AOT code to an interpreter.

Contributors:
- Aleksandar Gradinac: Initial implementation on linux-amd64.
- Marko Spasic: Miscellaneous improvements.
- Bernhard Urban-Forster: Support for linux-aarch64, darwin-aarch64 and darwin-amd64.
- Alfonso² Peterssen: Support for windows-amd64.

Co-authored-by: Aleksandar Gradinac <[email protected]>
Co-authored-by: Marko Spasic <[email protected]>
Co-authored-by: Alfonso² Peterssen <[email protected]>
@zakkak
Copy link
Collaborator

zakkak commented Oct 24, 2024

@mukel this is a gentle ping for the question above ^^

@mukel
Copy link
Member

mukel commented Oct 24, 2024

The PLT/GOT is disabled by default and only enabled ATM for the JDWP debugger. It may be used in the future for other purposes e.g. compressing cold code in the image.

There's a single digit % performance impact, depending on the benchmark... @mspasic-oracle can you provide more data here?

All calls should go through it; with some exceptions. For the JDWP debugger, it is only applied for methods that can be diverted to the interpreter; it doesn't affect inlined methods, nor methods marked with @Uninterruptible ...

@graalvmbot graalvmbot closed this Oct 25, 2024
@graalvmbot graalvmbot deleted the buf/pltgot-ce branch October 25, 2024 12:59
@graalvmbot graalvmbot merged commit ed935b7 into master Oct 25, 2024
13 checks passed
@spaske00
Copy link
Contributor

spaske00 commented Oct 28, 2024

@zakkak Hi, @mspasic-oracle here.

How does this added indirection impact the performance of direct calls? Do you have any evaluation results you could share?

  1. I reran the benchmarks so we get the most recent results. I'll share them here.
    In more detail:
Default:                        PLT-GOT:
call f                          mov rax, GOT_TABLE[gotTableOffsetFor(f)]]  <--- stores the address of f
                                call rax

So it's an extra mov instruction on every subsequent call site, once the method has been resolved.
The first call to a method directed through PLT/GOT triggers a MethodAddressResolver. The job of the MethodAddressResolver is to do whatever is necessary to provide the method's address and write it into the GOT table. Every subsequent call to that method reads the method's address from the GOT table and calls it through a register. You can find more details in the documentation for PLTGOTFeature.java and AMD64PLTStubGenerator.java.

Does it apply unconditionally to all calls?

  1. That depends on the use case of the PLT/GOT feature, and it is user-specified. The user (programmer) can decide which methods are directed through the PLT/GOT feature. The option -H:+PrintGOT will dump the methods called through the PLT/GOT.

If there’s anything more I can help with, please feel free to reach out on the public graalvm Slack channel.

@zakkak
Copy link
Collaborator

zakkak commented Oct 29, 2024

Thank you @mukel and @spaske00 that definitely clears things up.

I reran the benchmarks so we get the most recent results. I'll share them here.

Please do so by posting a new comment (instead of editing the existing one) so that we will get notified. Thanks again.

@spaske00
Copy link
Contributor

Hi @zakkak,

I reran the benchmarks so we get the most recent results. I'll share them here.

There was no statistically significant change in the dacapo and renaissance benchmarks compared to the main branch. All the benchmarks with PLT/GOT enabled and redirecting almost all methods through the PLT/GOT (except for some external C calls and Uninterrupitble methods) are within the statistical variability.

Side note: The main overhead in MethodAddressResolver comes from saving the caller’s context (specific registers) onto the stack and restoring them after the address resolution is complete. However, the resolver’s job in the base PLT/GOT setup is simply to read the method address from the method table and write it to the GOT table. Once a method’s address is resolved, every subsequent call to that method incurs the overhead of an additional mov instruction (example from my previous comment).

@zakkak
Copy link
Collaborator

zakkak commented Oct 29, 2024

Thanks for the quick reply @spaske00

Once a method’s address is resolved, every subsequent call to that method incurs the overhead of an additional mov instruction (example from my previous comment).

Out of curiosity, does this mean that PLT/GOT never interferes with direct calls? What about monomorphic call sites where we can use a direct call?

@spaske00
Copy link
Contributor

@zakkak

Out of curiosity

That's how I learn the most! :D

Out of curiosity, does this mean that PLT/GOT never interferes with direct calls? What about monomorphic call sites where we can use a direct call?

The MethodAddressResolutionSupport#shouldCallViaPLTGOT(SharedMethod caller, SharedMethod callee) determines whether a call-sites in the caller of the method callee will be directed through the PLT/GOT.

The compiler decides whether a call is direct or indirect. The PLT/GOT then transforms the direct calls for which shouldCallViaPLTGOT is true and directs them through PLT/GOT, via instructions in the example above.

The call sites of virtual calls that go through PLT/GOT remain unchanged; we just mark appropriate vtable entries as relocations to the appropriate PLT stubs.

Is that what you were asking or did I misunderstand?

@zakkak
Copy link
Collaborator

zakkak commented Oct 31, 2024

Is that what you were asking or did I misunderstand?

Yes. So in some cases PLT/GOT will end up replacing a direct calls with indirect ones (which should have a bigger impact than just an additional mov instructions. It just seems that this doesn't happen often enough.

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants