Skip to content

Commit

Permalink
Fix DEBUG/non-DEBUG asm diffs due to instruction group size estimation (
Browse files Browse the repository at this point in the history
dotnet#49947)

* Fix DEBUG/non-DEBUG asm diffs due to instruction group size estimation

This change fixes a long-existing difference between DEBUG and non-DEBUG codegen
that recently manifest as asm diffs in instruction encoding in a Checked/Release
SuperPMI run, mostly in the tests\JIT\Directed\arglist\vararg.cs test, specifically:

```
NativeVarargTest.VarArg:TestPassingManyFloats(System.Double[]):bool
NativeVarargTest.VarArg:TestPassingManyDoublesManaged(System.Double[]):bool
NativeVarargTest.VarArg:TestPassingManyDoubles(System.Double[]):bool
NativeVarargTest.VarArg:TestPassingManyFloatsManaged(System.Single[]):bool
double1:ProcessJagged3DArray(System.Double[][])
``
Example diff:
```
215c7758454: 76 6a                      jbe     106 // checked
215c77b8ab4: 0f 86 6a 00 00 00          jbe     106 // release
```

Note that the offset ends up exactly the same, but the encoding is different.

Some context on the emitter:

The emitter collects generated machine instructions as a set of variable-sized
`instrDesc` structures in instruction groups (IG, or `insGroup`). An IG is filled
with a variable number of instrDescs into a fixed-sized buffer, which, when full
or a new label (e.g., branch target) is encountered, is copied to dynamically
allocated precise-sized memory for the set of instructions collected. If a BasicBlock
required more instructions than fit in a single IG, multiple IG are generated, and
the ones besides the first are marked with the IGF_EXTEND flag, called "emitAdd"
or overflow IG. After codegen, all the IG are filled, and we have an estimated
size for each instruction, and each IG has an estimated code offset from the
beginning of the function. The estimated sizes could be wrong if we incorrectly
estimated the size of an instruction (which in most cases could be considered a
bug, but is particularly common with our Intel vector instructions currently) or
the offset required for a branch (especially a forward branch). The JIT only does
a single pass of branch tightening, so could have some cases of non-optimal estimates.
Note that estimates are required to be too large; it is fatal for an estimate to
be too small. During emission, we walk the IG list and instructions, encoding them
into the memory buffer provided by the VM. At this point, we might shrink an instruction
encoding, possibly because we can use a smaller branch encoding that expected.

Notably, the size of instrDescs and insGroups varies between DEBUG and non-DEBUG builds.
That means that the number of instrDescs that fit in an IG differs. This should have
no effect on generated code, as long as code properly handles overflow IG and doesn't
affect codegen based on IG size or number of contained instructions.

The problem:

During emission, we calculate an "offset adjustment" (in `emitOffsAdj`) that is used to
estimate the distance from a forward jump instruction to its target. This determines the
size of jump instruction encodings that we emit. For example, on x86/x64, a relative
conditional branch can have a 1 byte or a 4 byte target offset. A 1-byte offset can be
used if the target instruction is 127 bytes or fewer away. So, we estimate the distance
using the jump instruction's known offset and the estimated offset of the target, which
is the estimated offset of the instruction group (the label) calculated before emission
minus the "shrinkage" of instruction sizes that we've seen so far. This "shrinkage" is
due to over-estimation of instruction encoding sizes and branch estimated sizes.

The shrinkage in `emitOffsAdjs` is only computed at the beginning of an IG. As additional
instructions in an IG shrink, we don't take that into account when we reach a forward jump
instruction and try to determine if a "small" branch encoding can be used. Thus, it's
possible for a branch to be estimated to require a large 4-byte offset instead of a 1-byte
offset because the forward branch is barely over the limit, but wouldn't be if we considered
the already-shrunk instructions in the current IG. In fact, the more instructions that
shrink in an IG, the larger our target offset estimate becomes, because the source instruction
address is precise (since we know precisely how many bytes of instructions we've emitted),
but the target offset estimate is fixed.

Due to the different number of instructions in an IG between DEBUG and non-DEBUG builds,
the estimate we use for a target offset for any particular branch, given enough instruction
shrinkage and overflowing IG (such that there are a different number of IG between DEBUG and
non-DEBUG) is different. This leads to different instruction encoding between the flavors.

The solution:

The solution is simple: track instruction shrinkage as we emit instructions, and use the
latest, most accurate shrinkage adjustment when estimating branches. This gives no difference
between DEBUG and non-DEBUG because it removes the indirect dependence on IG instruction
count in recomputing `emitOffsAdj`.

There are asm diffs (with just a Checked x64 JIT compared to baseline Checked JIT) due to some
branches now being SHORT where previously they were LONG.

* Add assert
  • Loading branch information
BruceForstall authored Mar 22, 2021
1 parent 143103b commit 7012cd7
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 28 deletions.
40 changes: 35 additions & 5 deletions src/coreclr/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3747,6 +3747,14 @@ size_t emitter::emitIssue1Instr(insGroup* ig, instrDesc* id, BYTE** dp)
}
#endif // DEBUG_EMIT

// Add the shrinkage to the ongoing offset adjustment. This needs to happen during the
// processing of an instruction group, and not only at the beginning of an instruction
// group, or else the difference of IG sizes between debug and release builds can cause
// debug/non-debug asm diffs.
int offsShrinkage = estimatedSize - actualSize;
JITDUMP("Increasing size adj %d by %d => %d\n", emitOffsAdj, offsShrinkage, emitOffsAdj + offsShrinkage);
emitOffsAdj += offsShrinkage;

/* The instruction size estimate wasn't accurate; remember this */

ig->igFlags |= IGF_UPD_ISZ;
Expand Down Expand Up @@ -5273,6 +5281,8 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
emitCodeBlock = nullptr;
emitConsBlock = nullptr;

emitOffsAdj = 0;

/* Tell everyone whether we have fully interruptible code or not */

emitFullyInt = fullyInt;
Expand Down Expand Up @@ -5733,17 +5743,37 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,

/* Record the actual offset of the block, noting the difference */

emitOffsAdj = ig->igOffs - emitCurCodeOffs(cp);
assert(emitOffsAdj >= 0);
int newOffsAdj = ig->igOffs - emitCurCodeOffs(cp);

#if DEBUG_EMIT
if ((emitOffsAdj != 0) && emitComp->verbose)
#ifdef DEBUG
// Under DEBUG, only output under verbose flag.
if (emitComp->verbose)
#endif // DEBUG
{
printf("Block predicted offs = %08X, actual = %08X -> size adj = %d\n", ig->igOffs, emitCurCodeOffs(cp),
emitOffsAdj);
if (newOffsAdj != 0)
{
printf("Block predicted offs = %08X, actual = %08X -> size adj = %d\n", ig->igOffs, emitCurCodeOffs(cp),
newOffsAdj);
}
if (emitOffsAdj != newOffsAdj)
{
printf("Block expected size adj %d not equal to actual size adj %d (probably some instruction size was "
"underestimated but not included in the running `emitOffsAdj` count)\n",
emitOffsAdj, newOffsAdj);
}
}
// Make it noisy in DEBUG if these don't match. In release, the noway_assert below checks the
// fatal condition.
assert(emitOffsAdj == newOffsAdj);
#endif // DEBUG_EMIT

// We can't have over-estimated the adjustment, or we might have underestimated a jump distance.
noway_assert(emitOffsAdj <= newOffsAdj);

emitOffsAdj = newOffsAdj;
assert(emitOffsAdj >= 0);

ig->igOffs = emitCurCodeOffs(cp);
assert(IsCodeAligned(ig->igOffs));

Expand Down
32 changes: 9 additions & 23 deletions src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12532,13 +12532,12 @@ BYTE* emitter::emitOutputLJ(insGroup* ig, BYTE* dst, instrDesc* i)

if (id->idCodeSize() != JMP_SIZE_SMALL)
{
emitOffsAdj += id->idCodeSize() - JMP_SIZE_SMALL;

#ifdef DEBUG
if (emitComp->verbose)
#if DEBUG_EMIT || defined(DEBUG)
int offsShrinkage = id->idCodeSize() - JMP_SIZE_SMALL;
if (INDEBUG(emitComp->verbose ||)(id->idDebugOnlyInfo()->idNum == (unsigned)INTERESTING_JUMP_NUM ||
INTERESTING_JUMP_NUM == 0))
{
printf("; NOTE: size of jump [%08X] mis-predicted by %d bytes\n", emitComp->dspPtr(id),
(id->idCodeSize() - JMP_SIZE_SMALL));
printf("; NOTE: size of jump [%08p] mis-predicted by %d bytes\n", dspPtr(id), offsShrinkage);
}
#endif
}
Expand Down Expand Up @@ -12699,11 +12698,10 @@ size_t emitter::emitOutputInstr(insGroup* ig, instrDesc* id, BYTE** dp)
{
assert(emitIssuing);

BYTE* dst = *dp;
size_t sz = sizeof(instrDesc);
instruction ins = id->idIns();
unsigned char callInstrSize = 0;
int emitOffsAdjBefore = emitOffsAdj;
BYTE* dst = *dp;
size_t sz = sizeof(instrDesc);
instruction ins = id->idIns();
unsigned char callInstrSize = 0;

#ifdef DEBUG
bool dspOffs = emitComp->opts.dspGCtbls;
Expand Down Expand Up @@ -13882,18 +13880,6 @@ size_t emitter::emitOutputInstr(insGroup* ig, instrDesc* id, BYTE** dp)
#endif

dst = emitOutputNOP(dst, diff);

// since we compensated the over-estimation, revert the offsAdj that
// might have happened in the jump
if (emitOffsAdjBefore != emitOffsAdj)
{
#ifdef DEBUG
insFormat format = id->idInsFmt();
assert((format == IF_LABEL) || (format == IF_RWR_LABEL) || (format == IF_SWR_LABEL));
assert(diff == (emitOffsAdj - emitOffsAdjBefore));
#endif
emitOffsAdj -= diff;
}
}
assert((id->idCodeSize() - ((UNATIVE_OFFSET)(dst - *dp))) == 0);
}
Expand Down

0 comments on commit 7012cd7

Please sign in to comment.