Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix building with LLVM 5 and 6 so that the stack allocation workaround is no longer needed. #2371

Closed
chalcolith opened this issue Nov 24, 2017 · 15 comments
Assignees
Labels
triggers release Major issue that when fixed, results in an "emergency" release

Comments

@chalcolith
Copy link
Member

Code generated by a ponyc built with LLVM 4 and up will often segfault on *nix (see #2303, #2061, #1592). The change in LLVM that broke things is believed to be 290738:

Index: LICM.cpp
===================================================================
--- LICM.cpp    (revision 290737)
+++ LICM.cpp    (revision 290738)
@@ -1034,7 +1034,8 @@
   if (!SafeToInsertStore) {
     Value *Object = GetUnderlyingObject(SomePtr, MDL);
     SafeToInsertStore =
-        isAllocLikeFn(Object, TLI) && !PointerMayBeCaptured(Object, true, true);
+        (isAllocLikeFn(Object, TLI) || isa<AllocaInst>(Object)) &&
+        !PointerMayBeCaptured(Object, true, true);
   }

  // If we've still failed to prove we can sink the store, give up.

#2303 implements a workaround (in the canStackAlloc() function in genopt.cc) where in our HeapToStack optimization pass we do not stack allocate values that are loaded in a function. But this incurs a performance hit, so we need to figure out the real problem.

@jemc jemc added bug: 3 - ready for work triggers release Major issue that when fixed, results in an "emergency" release labels Nov 25, 2017
@jemc
Copy link
Member

jemc commented Nov 25, 2017

When this ticket is resolved, we can mark LLVM >= 4 support as stable rather than experimental, and we can remove official support for all but the three most recent "patch levels" of the three most recent "minor versions" of LLVM.

chalcolith added a commit that referenced this issue Apr 11, 2018
This change adds experimental support for LLVM 6.0.0 and removes support for 4.0.1.

The change still includes the workaround in the `Heap2Stack` optimization pass that can cause performance degradation (see #2371).
@chalcolith chalcolith changed the title Fix building with LLVM 4 and 5 so that the stack allocation workaround is no longer needed. Fix building with LLVM 5 and 6 so that the stack allocation workaround is no longer needed. Apr 11, 2018
dipinhora pushed a commit to dipinhora/ponyc that referenced this issue Jun 5, 2018
This change adds experimental support for LLVM 6.0.0 and removes support for 4.0.1.

The change still includes the workaround in the `Heap2Stack` optimization pass that can cause performance degradation (see ponylang#2371).
@malthe
Copy link
Contributor

malthe commented Sep 26, 2018

I can't reproduce this on LLVM 5.0.0. That is, removing the workaround does not cause segfaults.

This comment suggests that the issue may have been fixed in Clang (not LLVM):

emscripten-core/emscripten#5362 (comment)

@chalcolith
Copy link
Member Author

On first glance that issue doesn't seem related, but we did have to update TBAA for LLVM 4, so who knows?

FYI I have been doing some work on using LLVM 7 (https://github.com/kulibali/ponyc/tree/llvm700), but it's not done yet. I could use some help with the JIT implementation.

@jemc
Copy link
Member

jemc commented Jan 21, 2019

To be clear, this is the workaround that we're talking about removing:

// This is a workaround for a problem with LLVM 4 & 5 on *nix when
// hoisting loads (see #2303, #2061, #1592).
// TODO: figure out the real reason LLVM 4 and 5 produce bad code
// when hoisting stack allocated loads.
#if PONY_LLVM >= 400 && !defined(_MSC_VER)
// fall through
#else
break;
#endif

When removed, even on LLVM 7.0.1 we still segfault when running standard library tests that were compiled in release mode (that is, when ponyc was run with LLVM optimizations enabled).


I've spent quite a bit of time troubleshooting this one. This is indeed an issue with the LoopInvariantCodeMotion pass in LICM.cpp but the given diff is not the cause of the problem (that is, I tried reversing the effects of that diff and recompiling LLVM 7, but it does not solve the problem).

Here is a minimal program that demonstrates the issue, based on stripping away unnecessary pieces of a segfaulting persistent.Map test in the standard library test suite:

class val _Field
  let _inner: U32
  new trn create(inner: U32 = 0) => _inner = inner
  fun val clone(): _Field trn^ => create(_inner)

class val Counter
  let _value: USize
  let _field: _Field
  
  new val create(v: USize = 0, f: _Field = _Field) =>
    _value = v
    _field = f
  
  fun val inc(v: USize = 1): Counter =>
    create(_value + v, _field.clone())
  
  fun val value(): USize =>
    _value

actor Main
  new create(env: Env) =>
    segfault(env)
  
  fun tag segfault(env: Env) =>
    var counter = Counter
    
    var n: USize = 0
    while (n = n + 1) < 300 do
      counter = counter.inc()
      @printf[I32]("%zu\n".cpointer(), counter.value())
    end

This program should print the numbers 1 through 300 in sequence, but instead prints the number 1 each time the loop is run. Some other very similar variants of this minimal program will segfault.

When compiled, the pretty much all of the functions called from within the segfault function get inlined into it, and our HeapToStack pass will convert the Counter heap allocations (calls to pony_alloc_small) into stack allocations (the LLVM alloca instruction). This is working as intended, with the stack allocations being an optimization over those heap allocations.

The job of the LoopInvariantCodeMotion pass is to notice code that can be moved out of a loop without changing the effects/behaviour of the loop, and move that code to either the pre-loop or post-loop area.

However, when the LoopInvariantCodeMotion pass runs on this code, it doesn't notice that the counter.value field reassignments in the loop (nested inside the inlined function counter.inc() are reading from and writing to the memory of the alloca, and it thinks that it is safe to move the assignment (an LLVM store instruction) to the post-loop area. The printf is optimized to print n + 1 instead of actually printing the counter.value(), and it doesn't see the connection between the read/load operation in inc and the write/store operation in inc, so it moves the store to be in the post-loop area.

Here is the final erroneous code LLVM IR that gets produced after the LICM pass runs:

define private fastcc nonnull %None* @Main_tag_segfault_oo(%Main* nocapture readnone dereferenceable(256) %this, %Env* noalias nocapture readonly dereferenceable(64) %env) unnamed_addr !dbg !150 !pony.abi !4 {
entry:
  %0 = alloca [32 x i8], align 8
  %1 = alloca [32 x i8], align 8
  %.sub12 = getelementptr inbounds [32 x i8], [32 x i8]* %1, i64 0, i64 0
  %.sub = getelementptr inbounds [32 x i8], [32 x i8]* %0, i64 0, i64 0
  %2 = tail call i8* @pony_ctx()
  %3 = tail call i8* @pony_alloc_small(i8* %2, i32 0)
  %4 = bitcast i8* %3 to %_Field_Desc**
  store %_Field_Desc* @_Field_Desc, %_Field_Desc** %4, align 8, !tbaa !160
  %5 = getelementptr inbounds i8, i8* %3, i64 8, !dbg !176
  %6 = bitcast i8* %5 to i32*, !dbg !176
  store i32 0, i32* %6, align 4, !dbg !176, !tbaa !177
  %7 = bitcast [32 x i8]* %1 to %Counter_Desc**
  store %Counter_Desc* @Counter_Desc, %Counter_Desc** %7, align 8, !tbaa !160
  %8 = getelementptr inbounds [32 x i8], [32 x i8]* %1, i64 0, i64 8, !dbg !196
  %9 = bitcast i8* %8 to i64*, !dbg !196
  store i64 0, i64* %9, align 8, !dbg !196, !tbaa !197, !noalias !200
  %10 = getelementptr inbounds [32 x i8], [32 x i8]* %1, i64 0, i64 16, !dbg !203
  %11 = bitcast i8* %10 to i8**, !dbg !203
  store i8* %3, i8** %11, align 8, !dbg !203, !tbaa !197, !noalias !200
  %12 = bitcast [32 x i8]* %0 to %Counter_Desc**
  %13 = getelementptr inbounds [32 x i8], [32 x i8]* %0, i64 0, i64 8
  %14 = bitcast i8* %13 to i64*
  %15 = getelementptr inbounds [32 x i8], [32 x i8]* %0, i64 0, i64 16
  %16 = bitcast i8* %15 to i8**
  br label %while_body
while_body:                                       ; preds = %while_body, %entry
  %counter.0.in = phi i8* [ %.sub12, %entry ], [ %.sub, %while_body ]
  %n.0 = phi i64 [ 1, %entry ], [ %31, %while_body ], !dbg !208
  %17 = getelementptr inbounds i8, i8* %counter.0.in, i64 8, !dbg !219
  %18 = bitcast i8* %17 to i64*, !dbg !219
  %19 = load i64, i64* %18, align 8, !dbg !219, !tbaa !220, !alias.scope !222
  %20 = add i64 %19, 1, !dbg !219
  %21 = getelementptr inbounds i8, i8* %counter.0.in, i64 16
  %22 = bitcast i8* %21 to %_Field**
  %23 = load %_Field*, %_Field** %22, align 8, !tbaa !220, !alias.scope !222
  %24 = getelementptr inbounds %_Field, %_Field* %23, i64 0, i32 1
  %25 = load i32, i32* %24, align 4, !tbaa !231, !alias.scope !233, !noalias !222
  %26 = tail call i8* @pony_alloc_small(i8* %2, i32 0) #2, !noalias !236
  %27 = bitcast i8* %26 to %_Field_Desc**
  store %_Field_Desc* @_Field_Desc, %_Field_Desc** %27, align 8, !tbaa !160, !noalias !236
  %28 = getelementptr inbounds i8, i8* %26, i64 8, !dbg !240
  %29 = bitcast i8* %28 to i32*, !dbg !240
  store i32 %25, i32* %29, align 4, !dbg !240, !tbaa !177, !noalias !236
  store i8* %26, i8** %16, align 8, !dbg !245, !tbaa !197, !noalias !246
  %30 = tail call i32 (...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @0, i64 0, i64 0), i64 %20), !dbg !273
  %31 = add nuw nsw i64 %n.0, 1, !dbg !274
  %exitcond = icmp eq i64 %31, 301
  br i1 %exitcond, label %while_post, label %while_body
while_post:                                       ; preds = %while_body
  %.lcssa = phi i64 [ %20, %while_body ]
  store %Counter_Desc* @Counter_Desc, %Counter_Desc** %12, align 1, !tbaa !160, !noalias !222
  store i64 %.lcssa, i64* %14, align 1, !dbg !275, !tbaa !197, !noalias !246
  ret %None* @None_Inst, !dbg !205
}

Note that the store %Counter_Desc* line was safe to move into the while_post block because nobody is reading/matching on that descriptor, but the store i64 line wasn't safe to move out of the loop because this is the line that writes/stores the incremented counter.value(), and we need to read/load that later on (in the next iteration of the loop) in the line labeled %19.


I spent some time trying to studying and stepping through the LICM code to see what it was missing, but couldn't really figure it out. Finally, I decided to try dumping the LLVM IR just before the LICM pass, then run it in LLVM's opt tool which can run specific optimization passes in specific orders, and which is used in their regression test suite, so I could make a minimal regression test for their suite to bisect the issue.

However, to my surprise, the same bug doesn't happen when I run the same LLVM IR through the LICM pass in the opt tool (using the invocation opt -licm --debug-pass=Executions -S /path/to/example.ll). When built with opt as a standalone pass, the right optimizations happen safely without making the unsafe optimization. After stepping through both environments in parallel, I noticed that the AliasSet information coming into the pass was different in each case.

Here it is in opt (the bug-free case):

Alias Set Tracker: 4 alias sets for 8 pointer values.
  AliasSet[0x662cc30, 6] may alias, Mod/Ref   Pointers: (i64* %18, 8), (i64* %14, 8), (%_Field** %22, 8), (i8** %16, 8), (i32* %24, 4), (%_Field_Desc** %27, 8), (i32* %29, 4)
    2 Unknown instructions:   %26 = tail call i8* @pony_alloc_small(i8* %2, i32 0) #2, !noalias !129,   %30 = tail call i32 (...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @0, i64 0, i64 0), i64 %20), !dbg !167
  AliasSet[0x65bdb00, 1] must alias, Ref        forwarding to 0x662cc30
  AliasSet[0x6608d40, 3] may alias, Mod/Ref    forwarding to 0x662cc30
  AliasSet[0x6609270, 1] must alias, Mod       Pointers: (%Counter_Desc** %12, 8)

Here it is in the buggy optimizer (embedded in ponyc):

Alias Set Tracker: 7 alias sets for 8 pointer values.
  AliasSet[0x64343b0, 5] may alias, Mod/Ref   Pointers: (i64* %14, 8), (%_Field** %18, 8), (i32* %20, 4), (%_Field_Desc** %23, 8), (i32* %25, 4)
    2 Unknown instructions:   %22 = tail call i8* @pony_alloc_small(i8* %12, i32 0) #2, !noalias !129,   %31 = tail call i32 (...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @0, i64 0, i64 0), i64 %16), !dbg !167
  AliasSet[0x6585ad0, 1] must alias, Ref        forwarding to 0x64343b0
  AliasSet[0x6434960, 1] must alias, Ref        forwarding to 0x64343b0
  AliasSet[0x657cdf0, 2] may alias, Mod/Ref    forwarding to 0x64343b0
  AliasSet[0x6589b30, 1] must alias, Mod       Pointers: (%Counter_Desc** %26, 8)
  AliasSet[0x64d2000, 1] must alias, Mod       Pointers: (i64* %28, 8)
  AliasSet[0x64d20a0, 1] must alias, Mod       Pointers: (i8** %30, 8)

Note that the latter (buggy) case has (i64* %28, 8) as a separate AliasSet by itself, while the former (correct) case has (i64* %18, 8) in the initial AliasSet that clumps together all the other pieces of the Counter object that are intermingled in memory, including the (i64* %14, 8) pointer which is in the initial AliasSet of both cases. This is why the correct case can correctly identify the memory connection and avoid the erroneous optimization.

So, something is causing the AliasSet results of the (buggy) ponyc case to be corrupted/incorrect, whereas in the opt (correct) case they are being generated from first principles just before running the pass (because the invocation just specified to run the licm pass, and it has the alias-building passes listed as dependencies of that pass).

So, something in the ponyc execution is causing the alias analysis to be corrupted without invalidating it, or otherwise not be regenerated when it should be (or perhaps it's being incorrectly generated for some other unknown reason). Perhaps one of our custom passes is breaking the alias analysis without marking it as dirty?

Once I figure out how to fix the alias analysis to run again/properly prior to the LICM pass, this should be fixed, I think.

@chalcolith
Copy link
Member Author

We updated the Type-Based Alias Analysis implementation for LLVM 4, might want to look at those changes.

@jemc
Copy link
Member

jemc commented Jan 22, 2019

Thanks for the nudge, I'll look into it. I'm guessing that probably has something to do with it, because every other idea I've had about how alias analysis may be getting messed up hasn't panned out to be true so far.

@jemc
Copy link
Member

jemc commented Jan 22, 2019

Yep, sure enough, I'm now able to reproduce the buggy behaviour when I include a Type-Based Alias Analysis pass in the opt invocation (i.e. opt -tbaa -licm --debug-pass=Executions -S /path/to/example.ll).

So yeah, that implies something is probably buggy with our TBAA metadata, leading to false negatives in the alias analysis done by LLVM (or, less likely, buggy type-based alias analysis logic in LLVM).

@jemc
Copy link
Member

jemc commented Jan 22, 2019

Here's a minimal LLVM IR repro that acts like Pony code does and shows the problem in action (I've removed most of the generated code to leave only the counter.value part of the Pony example code from above, and I've named the SSA values and commented it to show how it works):

; ModuleID = 'test.jemc'

@0 = private unnamed_addr constant [5 x i8] c"%ld\0A\00"

declare i32 @printf(...) local_unnamed_addr

define i32 @main() unnamed_addr {
entry:
  %counter_a = alloca [32 x i8], align 8
  %counter_b = alloca [32 x i8], align 8
  %counter_a.gep0 = getelementptr inbounds [32 x i8], [32 x i8]* %counter_a, i64 0, i64 0
  %counter_b.gep0 = getelementptr inbounds [32 x i8], [32 x i8]* %counter_b, i64 0, i64 0
  
  %counter_a.value = getelementptr inbounds [32 x i8], [32 x i8]* %counter_a, i64 0, i64 8
  %counter_a.value.cast = bitcast i8* %counter_a.value to i64*
  store i64 0, i64* %counter_a.value.cast, align 8, !tbaa !197
  
  %counter_b.value = getelementptr inbounds [32 x i8], [32 x i8]* %counter_b, i64 0, i64 8
  %counter_b.value.cast = bitcast i8* %counter_b.value to i64*
  
  br label %loop_body

loop_body:                                       ; preds = %loop_body, %entry
  ; In the first iteration, set 'n = 1'
  ; In subsequent iterations, retain 'n.plus1' as 'n'
  %n = phi i64 [ 1, %entry ], [ %n.plus1, %loop_body ]
  
  ; In the first iteration, set 'counter = counter_a';
  ; In subsequent iterations, retain 'counter_b' as 'counter'
  %counter = phi i8* [ %counter_a.gep0, %entry ], [ %counter_b.gep0, %loop_body ]
  
  ; Extract the current 'counter.value', add 1, and store into 'counter_b.value'
  %counter.value = getelementptr inbounds i8, i8* %counter, i64 8
  %counter.value.cast = bitcast i8* %counter.value to i64*
  %counter.value.load = load i64, i64* %counter.value.cast, align 8, !tbaa !220
  %counter.value.plus1 = add i64 %counter.value.load, 1
  store i64 %counter.value.plus1, i64* %counter_b.value.cast, align 8, !tbaa !197
  
  ; Print 'counter.value.plus1'
  %printf.retval = tail call i32 (...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @0, i64 0, i64 0), i64 %counter.value.plus1)
  
  ; Increment 'n' and exit the loop if 'n >= 5'
  %n.plus1 = add nuw nsw i64 %n, 1
  %loop.condition = icmp eq i64 %n.plus1, 5 ; stop after 4 iterations
  br i1 %loop.condition, label %after_loop, label %loop_body

after_loop:                                       ; preds = %loop_body
  ret i32 0
}

!0 = !{!"Pony TBAA"}
!197 = !{!198, !198, i32 0}
!198 = !{!"Counter_ref", !199}
!199 = !{!"Counter_box", !0}
!220 = !{!221, !221, i32 0}
!221 = !{!"Counter_val", !0}

You can run the example with an invocation like: opt -tbaa -licm -S /path/to/example.ll | tee /dev/tty | lli. You'll get output showing the resulting IR after optimization, followed by the output of running the program:

; ModuleID = 'test/Transforms/LICM/jemc.ll'
source_filename = "test/Transforms/LICM/jemc.ll"

@0 = private unnamed_addr constant [5 x i8] c"%ld\0A\00"

declare i32 @printf(...) local_unnamed_addr

define i32 @main() unnamed_addr {
entry:
  %counter_a = alloca [32 x i8], align 8
  %counter_b = alloca [32 x i8], align 8
  %counter_a.gep0 = getelementptr inbounds [32 x i8], [32 x i8]* %counter_a, i64 0, i64 0
  %counter_b.gep0 = getelementptr inbounds [32 x i8], [32 x i8]* %counter_b, i64 0, i64 0
  %counter_a.value = getelementptr inbounds [32 x i8], [32 x i8]* %counter_a, i64 0, i64 8
  %counter_a.value.cast = bitcast i8* %counter_a.value to i64*
  store i64 0, i64* %counter_a.value.cast, align 8, !tbaa !0
  %counter_b.value = getelementptr inbounds [32 x i8], [32 x i8]* %counter_b, i64 0, i64 8
  %counter_b.value.cast = bitcast i8* %counter_b.value to i64*
  br label %loop_body

loop_body:                                        ; preds = %loop_body, %entry
  %n = phi i64 [ 1, %entry ], [ %n.plus1, %loop_body ]
  %counter = phi i8* [ %counter_a.gep0, %entry ], [ %counter_b.gep0, %loop_body ]
  %counter.value = getelementptr inbounds i8, i8* %counter, i64 8
  %counter.value.cast = bitcast i8* %counter.value to i64*
  %counter.value.load = load i64, i64* %counter.value.cast, align 8, !tbaa !4
  %counter.value.plus1 = add i64 %counter.value.load, 1
  %printf.retval = tail call i32 (...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @0, i64 0, i64 0), i64 %counter.value.plus1)
  %n.plus1 = add nuw nsw i64 %n, 1
  %loop.condition = icmp eq i64 %n.plus1, 5
  br i1 %loop.condition, label %after_loop, label %loop_body

after_loop:                                       ; preds = %loop_body
  %counter.value.plus1.lcssa = phi i64 [ %counter.value.plus1, %loop_body ]
  store i64 %counter.value.plus1.lcssa, i64* %counter_b.value.cast, align 1, !tbaa !0
  ret i32 0
}

!0 = !{!1, !1, i32 0}
!1 = !{!"Counter_ref", !2}
!2 = !{!"Counter_box", !3}
!3 = !{!"Pony TBAA"}
!4 = !{!5, !5, i32 0}
!5 = !{!"Counter_val", !3}
1
94112801004065
94112801004065
94112801004065

Note that when run with lli it shows garbage/undefined values after the first 1 instead of continuing to print 1. Also, I reduced the loop iteration count to just 4 iterations, since that still shows the buggy behaviour just fine - no need to flood the screen with 300 iterations.

@jemc
Copy link
Member

jemc commented Jan 22, 2019

Interestingly... running opt from LLVM 3.9 with the exact same LLVM IR input as we passed to the opt from LLVM 7 gives a program that works properly, without the erroneous optimization.

So the same TBAA format we use now for LLVM >= 4 worked fine in LLVM 3. So either:

  • we need to update our TBAA format for LLVM >= 4 for new semantics
  • there is a bug in TBAA analysis for LLVM >= 4
  • our TBAA format has always been broken, but LLVM >= 4 is revealing issues that were not revealed in LLVM 3, either due to them now being more aggressive with optimizations, or due to them having corrected a prior bug.

@jemc
Copy link
Member

jemc commented Jan 22, 2019

I'm leaning toward the "our TBAA format has always been broken" option - it seems to me that including the rcap in the TBAA is a mistake that will cause problems in the presence of recovery, since caps may be lifted and in so doing break the subtype aliasing rules that TBAA depends on, right? In this case it's not recover per se causing the issue, but a kind of pseudo-recovery based on the fact that this is a ref even inside of a new val constructor.

Note that this example would work fine if all three TBAA tree nodes (Counter_ref, Counter_box, and Counter_val) were collapsed into a single Counter TBAA type.

Also, there is an error in the current TBAA tree structure - Counter_box should be a subtype of both Counter_ref and Counter_val, but Counter_val is currently just shown to be a supertype of the bottom TBAA type. But I don't want to bother fixing this problem if we can agree that rcaps shouldn't be included in the TBAA metadata at all.

Marking as "needs discussion during sync" so we can make a decision about whether to move forward with my proposed fix or take some other path.

@jemc
Copy link
Member

jemc commented Jan 23, 2019

I tried the approach outlined above (removing caps from TBAA types), and it fixes the minimal LLVM IR repro I outlined above, but when I compile the (less minimal) Pony repro it still has a related issue where the alias isn't getting detected and the store is getting moved out of the loop.

Frustratingly, when I copy the full LLVM IR for that Pony repro out, and feed it into opt, I can't reproduce the issue. So something is still different between the two analysis pipelines. The next step is probably to do a full comparison of all the passes in the ponyc invocation of the optimizer, and try to duplicate that pass order in the opt invocation (unless anyone has any bright ideas like @kulibali's TBAA hint above).

I'm not going to do this right now. I'll wait til the LLVM 7 PR is merged, then do a separate PR to stop using the legacy pass builder, then come back to this issue if it's still a problem.

@jemc
Copy link
Member

jemc commented Jan 25, 2019

@sylvanc confirmed via Slack that he agrees that this seems to be unsound. It can be removed as part of the fix for this issue, though as I mentioned in my comment above, there is more to be done as well.

@jemc jemc self-assigned this Jan 25, 2019
jemc added a commit that referenced this issue Feb 11, 2019
Using the new PassBuilder is impractical for LLVM 3.9,
but it will pave the way to fixing some other problems
with LLVM > 4, including #2371.
@jemc
Copy link
Member

jemc commented Feb 26, 2019

To recap, the current state of this is:

  • Removing caps from TBAA is necessary, but not sufficient to make the heap to stack pass safe. That is, I tried it a few comments back, and there were still issues that I was having a hard time debugging.
  • I thought that using the new PassBuilder would help me debug the issues, but it only proved more troublesome, so I am abandoning that plan.
  • What remains here is to keep troubleshooting against current state. For example, maybe removing TBAA entirely (rather than just removing caps from it) will work where the other approach failed. Or maybe there is another issue to troubleshoot and figure out.

@jemc
Copy link
Member

jemc commented Feb 27, 2019

New development in the saga - the optimized LLVM attributes for val parameters introduced in #1046 were also unsound, for much the same reasons as I outlined for TBAA. That is, the rationale described in #1046 here is flawed:

  • Add noalias to val parameters in addition to readonly. Only read-only pointers to val objects can exist and read-only pointers never alias between each other in LLVM.

It's flawed because LLVM can optimize across time in cases such as the LICM pass.

@SeanTAllen
Copy link
Member

Nice find

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triggers release Major issue that when fixed, results in an "emergency" release
Projects
None yet
Development

No branches or pull requests

5 participants