Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large penalty for frequent side-trace exit #114

Closed
lukego opened this issue Nov 29, 2015 · 3 comments
Closed

Large penalty for frequent side-trace exit #114

lukego opened this issue Nov 29, 2015 · 3 comments

Comments

@lukego
Copy link

lukego commented Nov 29, 2015

The overhead of a frequently taken branch to a side-trace can be dramatic. See lukego/blog#8 for a discussion. This is presumably the root of the optimization advice to "avoid unbiased branches" and "prefer branch-free algorithms".

I decided to file this issue with a bounty (see below) after a tweet from @mraleph raised the very exciting possibility that there may be a simple (partial) solution.

This is a problem for common applications. For example inner loops in networking code will naturally include some unbiased branches such as:

  • Is this a TCP or UDP packet?
  • Is this an IPv4 or IPv6 packet?
  • Does this packet belong to a known session?

and to hoist such branches outside of inner loops (hot traces) can require elaborate programming practices, for example to sort packets into separate categories (taking the branch penalties) before processing them (doing the "heavy lifting" separate branch-free traces). This is unfortunate because it substantially "raises the bar" on how much expertise is required to write reasonably efficient code.

It would be wonderful if the JIT could somehow be enhanced such that a modest number of unbiased branches, such as the examples listed above, could be taken without a major performance penalty. This would bring efficient programming within the reach of more people and especially people without a strong mental model of trace-based just-in-time compilation (a large and interesting group!)

Here are some "in the wild" examples where a lot of programmer-time is being spent dealing with this issue:

and previous discussion in the LuaJIT community:

@lukego
Copy link
Author

lukego commented Nov 29, 2015

There is a BountySource bounty available to anybody who improves LuaJIT such that a modest number of unbiased branches (e.g. the three listed in the example above) could be included in a hot trace with only modest overhead. The example code on lukego/blog#8 could be used as an initial/partial test case.

@DemiMarie
Copy link

I have thought about how to implement this. This plan seems like it might work. It is based on LuaJIT using a 2-pass optimization scheme: one set of forward optimizations and one set of backwards optimizations.

  • Create IR that is the same as the IR used by the main trace, but with the trace-exiting guard reversed.
  • Run all forward optimization passes. Assert that the IR prior to the LOOP instruction is the same as the IR for the original trace.
  • We can't run all backward passes straightforwardly, because they may not be valid with the different control flow. Therefore, they must be run with the branching guard as a barrier, to insure that the pre-guard IR matches.

This has some problems, such as a potentially exponential trace blow-up, but should still be an improvement over the current situation.

@MikePall
Copy link
Member

MikePall commented May 6, 2016

There's no easy fix and no general solution with the current compiler backend.

Closing, since there's already #37 for the actual solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants