Large penalty for frequent side-trace exit #114

lukego · 2015-11-29T15:20:38Z

The overhead of a frequently taken branch to a side-trace can be dramatic. See lukego/blog#8 for a discussion. This is presumably the root of the optimization advice to "avoid unbiased branches" and "prefer branch-free algorithms".

I decided to file this issue with a bounty (see below) after a tweet from @mraleph raised the very exciting possibility that there may be a simple (partial) solution.

This is a problem for common applications. For example inner loops in networking code will naturally include some unbiased branches such as:

Is this a TCP or UDP packet?
Is this an IPv4 or IPv6 packet?
Does this packet belong to a known session?

and to hoist such branches outside of inner loops (hot traces) can require elaborate programming practices, for example to sort packets into separate categories (taking the branch penalties) before processing them (doing the "heavy lifting" separate branch-free traces). This is unfortunate because it substantially "raises the bar" on how much expertise is required to write reasonably efficient code.

It would be wonderful if the JIT could somehow be enhanced such that a modest number of unbiased branches, such as the examples listed above, could be taken without a major performance penalty. This would bring efficient programming within the reach of more people and especially people without a strong mental model of trace-based just-in-time compilation (a large and interesting group!)

Here are some "in the wild" examples where a lot of programmer-time is being spent dealing with this issue:

Tracing JITs and modern CPUs part 3: A bad case lukego/blog#8: Synthetic example.
Avoid ctype diversity for registers in intel10g snabbco/snabb#612: Side trace caused by FFI type guards.
traceprof: a new trace-oriented profiler for LuaJIT snabbco/snabb#623: traceprof profiler written to make this issue more transparent.
New implementation of the learning bridge app snabbco/snabb#638: Code rewritten to avoid side-trace problems: inner loops either ported to C to hide from the JIT or manually hoisted outside of the trace that contains branches (classification of unicast/multicast/discard packets).
Add PodHashMap Igalia/snabb#97: PodHashMap hashtable written in Lua with much hand-wringing about potential non-local impact of side traces.

and previous discussion in the LuaJIT community:

Add Hyperblock Scheduling #37: Hyperblock scheduling (potential solution that may be prohibitively difficult to implement).

lukego · 2015-11-29T15:27:52Z

There is a BountySource bounty available to anybody who improves LuaJIT such that a modest number of unbiased branches (e.g. the three listed in the example above) could be included in a hot trace with only modest overhead. The example code on lukego/blog#8 could be used as an initial/partial test case.

DemiMarie · 2016-04-06T19:36:17Z

I have thought about how to implement this. This plan seems like it might work. It is based on LuaJIT using a 2-pass optimization scheme: one set of forward optimizations and one set of backwards optimizations.

Create IR that is the same as the IR used by the main trace, but with the trace-exiting guard reversed.
Run all forward optimization passes. Assert that the IR prior to the LOOP instruction is the same as the IR for the original trace.
We can't run all backward passes straightforwardly, because they may not be valid with the different control flow. Therefore, they must be run with the branching guard as a barrier, to insure that the pre-guard IR matches.

This has some problems, such as a potentially exponential trace blow-up, but should still be an improvement over the current situation.

MikePall · 2016-05-06T11:08:04Z

There's no easy fix and no general solution with the current compiler backend.

Closing, since there's already #37 for the actual solution.

Implement asm_prof.

lukego mentioned this issue Feb 10, 2016

Create new "snabb" module to define our API snabbco/snabb#734

Open

DemiMarie mentioned this issue Apr 26, 2016

Add ARM64 JIT compiler backend #26

Closed

MikePall added the duplicate label May 6, 2016

MikePall closed this as completed May 6, 2016

lukego mentioned this issue Jul 31, 2016

How to control the scope of traces? #208

Closed

lukego mentioned this issue Oct 5, 2016

Side-trace Russian Roulette #218

Closed

akopytov pushed a commit to akopytov/LuaJIT that referenced this issue Oct 15, 2016

Merge pull request LuaJIT#114 from cbaylis/asm_prof

5f57c32

Implement asm_prof.

wingo mentioned this issue Mar 15, 2019

Determine whether or not to flush JIT when packets are lost Igalia/snabb#1216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large penalty for frequent side-trace exit #114

Large penalty for frequent side-trace exit #114

lukego commented Nov 29, 2015

lukego commented Nov 29, 2015

DemiMarie commented Apr 6, 2016

MikePall commented May 6, 2016

Large penalty for frequent side-trace exit #114

Large penalty for frequent side-trace exit #114

Comments

lukego commented Nov 29, 2015

lukego commented Nov 29, 2015

DemiMarie commented Apr 6, 2016

MikePall commented May 6, 2016