Fix DEBUG/non-DEBUG asm diffs due to instruction group size estimation (

dotnet#49947) * Fix DEBUG/non-DEBUG asm diffs due to instruction group size estimation This change fixes a long-existing difference between DEBUG and non-DEBUG codegen that recently manifest as asm diffs in instruction encoding in a Checked/Release SuperPMI run, mostly in the tests\JIT\Directed\arglist\vararg.cs test, specifically: ``` NativeVarargTest.VarArg:TestPassingManyFloats(System.Double[]):bool NativeVarargTest.VarArg:TestPassingManyDoublesManaged(System.Double[]):bool NativeVarargTest.VarArg:TestPassingManyDoubles(System.Double[]):bool NativeVarargTest.VarArg:TestPassingManyFloatsManaged(System.Single[]):bool double1:ProcessJagged3DArray(System.Double[][]) `` Example diff: ``` 215c7758454: 76 6a jbe 106 // checked 215c77b8ab4: 0f 86 6a 00 00 00 jbe 106 // release ``` Note that the offset ends up exactly the same, but the encoding is different. Some context on the emitter: The emitter collects generated machine instructions as a set of variable-sized `instrDesc` structures in instruction groups (IG, or `insGroup`). An IG is filled with a variable number of instrDescs into a fixed-sized buffer, which, when full or a new label (e.g., branch target) is encountered, is copied to dynamically allocated precise-sized memory for the set of instructions collected. If a BasicBlock required more instructions than fit in a single IG, multiple IG are generated, and the ones besides the first are marked with the IGF_EXTEND flag, called "emitAdd" or overflow IG. After codegen, all the IG are filled, and we have an estimated size for each instruction, and each IG has an estimated code offset from the beginning of the function. The estimated sizes could be wrong if we incorrectly estimated the size of an instruction (which in most cases could be considered a bug, but is particularly common with our Intel vector instructions currently) or the offset required for a branch (especially a forward branch). The JIT only does a single pass of branch tightening, so could have some cases of non-optimal estimates. Note that estimates are required to be too large; it is fatal for an estimate to be too small. During emission, we walk the IG list and instructions, encoding them into the memory buffer provided by the VM. At this point, we might shrink an instruction encoding, possibly because we can use a smaller branch encoding that expected. Notably, the size of instrDescs and insGroups varies between DEBUG and non-DEBUG builds. That means that the number of instrDescs that fit in an IG differs. This should have no effect on generated code, as long as code properly handles overflow IG and doesn't affect codegen based on IG size or number of contained instructions. The problem: During emission, we calculate an "offset adjustment" (in `emitOffsAdj`) that is used to estimate the distance from a forward jump instruction to its target. This determines the size of jump instruction encodings that we emit. For example, on x86/x64, a relative conditional branch can have a 1 byte or a 4 byte target offset. A 1-byte offset can be used if the target instruction is 127 bytes or fewer away. So, we estimate the distance using the jump instruction's known offset and the estimated offset of the target, which is the estimated offset of the instruction group (the label) calculated before emission minus the "shrinkage" of instruction sizes that we've seen so far. This "shrinkage" is due to over-estimation of instruction encoding sizes and branch estimated sizes. The shrinkage in `emitOffsAdjs` is only computed at the beginning of an IG. As additional instructions in an IG shrink, we don't take that into account when we reach a forward jump instruction and try to determine if a "small" branch encoding can be used. Thus, it's possible for a branch to be estimated to require a large 4-byte offset instead of a 1-byte offset because the forward branch is barely over the limit, but wouldn't be if we considered the already-shrunk instructions in the current IG. In fact, the more instructions that shrink in an IG, the larger our target offset estimate becomes, because the source instruction address is precise (since we know precisely how many bytes of instructions we've emitted), but the target offset estimate is fixed. Due to the different number of instructions in an IG between DEBUG and non-DEBUG builds, the estimate we use for a target offset for any particular branch, given enough instruction shrinkage and overflowing IG (such that there are a different number of IG between DEBUG and non-DEBUG) is different. This leads to different instruction encoding between the flavors. The solution: The solution is simple: track instruction shrinkage as we emit instructions, and use the latest, most accurate shrinkage adjustment when estimating branches. This gives no difference between DEBUG and non-DEBUG because it removes the indirect dependence on IG instruction count in recomputing `emitOffsAdj`. There are asm diffs (with just a Checked x64 JIT compared to baseline Checked JIT) due to some branches now being SHORT where previously they were LONG. * Add assert
lambdageek · Mar 22, 2021 · 7012cd7 · 7012cd7
1 parent 143103b
commit 7012cd7
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 28 deletions.
diff --git a/src/coreclr/jit/emit.cpp b/src/coreclr/jit/emit.cpp
@@ -3747,6 +3747,14 @@ size_t emitter::emitIssue1Instr(insGroup* ig, instrDesc* id, BYTE** dp)
         }
 #endif // DEBUG_EMIT
 
+        // Add the shrinkage to the ongoing offset adjustment. This needs to happen during the
+        // processing of an instruction group, and not only at the beginning of an instruction
+        // group, or else the difference of IG sizes between debug and release builds can cause
+        // debug/non-debug asm diffs.
+        int offsShrinkage = estimatedSize - actualSize;
+        JITDUMP("Increasing size adj %d by %d => %d\n", emitOffsAdj, offsShrinkage, emitOffsAdj + offsShrinkage);
+        emitOffsAdj += offsShrinkage;
+
         /* The instruction size estimate wasn't accurate; remember this */
 
         ig->igFlags |= IGF_UPD_ISZ;
@@ -5273,6 +5281,8 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
     emitCodeBlock = nullptr;
     emitConsBlock = nullptr;
 
+    emitOffsAdj = 0;
+
     /* Tell everyone whether we have fully interruptible code or not */
 
     emitFullyInt   = fullyInt;
@@ -5733,17 +5743,37 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
 
         /* Record the actual offset of the block, noting the difference */
 
-        emitOffsAdj = ig->igOffs - emitCurCodeOffs(cp);
-        assert(emitOffsAdj >= 0);
+        int newOffsAdj = ig->igOffs - emitCurCodeOffs(cp);
 
 #if DEBUG_EMIT
-        if ((emitOffsAdj != 0) && emitComp->verbose)
+#ifdef DEBUG
+        // Under DEBUG, only output under verbose flag.
+        if (emitComp->verbose)
+#endif // DEBUG
         {
-            printf("Block predicted offs = %08X, actual = %08X -> size adj = %d\n", ig->igOffs, emitCurCodeOffs(cp),
-                   emitOffsAdj);
+            if (newOffsAdj != 0)
+            {
+                printf("Block predicted offs = %08X, actual = %08X -> size adj = %d\n", ig->igOffs, emitCurCodeOffs(cp),
+                       newOffsAdj);
+            }
+            if (emitOffsAdj != newOffsAdj)
+            {
+                printf("Block expected size adj %d not equal to actual size adj %d (probably some instruction size was "
+                       "underestimated but not included in the running `emitOffsAdj` count)\n",
+                       emitOffsAdj, newOffsAdj);
+            }
         }
+        // Make it noisy in DEBUG if these don't match. In release, the noway_assert below checks the
+        // fatal condition.
+        assert(emitOffsAdj == newOffsAdj);
 #endif // DEBUG_EMIT
 
+        // We can't have over-estimated the adjustment, or we might have underestimated a jump distance.
+        noway_assert(emitOffsAdj <= newOffsAdj);
+
+        emitOffsAdj = newOffsAdj;
+        assert(emitOffsAdj >= 0);
+
         ig->igOffs = emitCurCodeOffs(cp);
         assert(IsCodeAligned(ig->igOffs));
 

diff --git a/src/coreclr/jit/emitxarch.cpp b/src/coreclr/jit/emitxarch.cpp
@@ -12532,13 +12532,12 @@ BYTE* emitter::emitOutputLJ(insGroup* ig, BYTE* dst, instrDesc* i)
 
         if (id->idCodeSize() != JMP_SIZE_SMALL)
         {
-            emitOffsAdj += id->idCodeSize() - JMP_SIZE_SMALL;
-
-#ifdef DEBUG
-            if (emitComp->verbose)
+#if DEBUG_EMIT || defined(DEBUG)
+            int offsShrinkage = id->idCodeSize() - JMP_SIZE_SMALL;
+            if (INDEBUG(emitComp->verbose ||)(id->idDebugOnlyInfo()->idNum == (unsigned)INTERESTING_JUMP_NUM ||
+                                              INTERESTING_JUMP_NUM == 0))
             {
-                printf("; NOTE: size of jump [%08X] mis-predicted by %d bytes\n", emitComp->dspPtr(id),
-                       (id->idCodeSize() - JMP_SIZE_SMALL));
+                printf("; NOTE: size of jump [%08p] mis-predicted by %d bytes\n", dspPtr(id), offsShrinkage);
             }
 #endif
         }
@@ -12699,11 +12698,10 @@ size_t emitter::emitOutputInstr(insGroup* ig, instrDesc* id, BYTE** dp)
 {
     assert(emitIssuing);
 
-    BYTE*         dst               = *dp;
-    size_t        sz                = sizeof(instrDesc);
-    instruction   ins               = id->idIns();
-    unsigned char callInstrSize     = 0;
-    int           emitOffsAdjBefore = emitOffsAdj;
+    BYTE*         dst           = *dp;
+    size_t        sz            = sizeof(instrDesc);
+    instruction   ins           = id->idIns();
+    unsigned char callInstrSize = 0;
 
 #ifdef DEBUG
     bool dspOffs = emitComp->opts.dspGCtbls;
@@ -13882,18 +13880,6 @@ size_t emitter::emitOutputInstr(insGroup* ig, instrDesc* id, BYTE** dp)
 #endif
 
             dst = emitOutputNOP(dst, diff);
-
-            // since we compensated the over-estimation, revert the offsAdj that
-            // might have happened in the jump
-            if (emitOffsAdjBefore != emitOffsAdj)
-            {
-#ifdef DEBUG
-                insFormat format = id->idInsFmt();
-                assert((format == IF_LABEL) || (format == IF_RWR_LABEL) || (format == IF_SWR_LABEL));
-                assert(diff == (emitOffsAdj - emitOffsAdjBefore));
-#endif
-                emitOffsAdj -= diff;
-            }
         }
         assert((id->idCodeSize() - ((UNATIVE_OFFSET)(dst - *dp))) == 0);
     }