GVN transformation regarding lifetime.start #433

aqjune · 2020-06-21T09:55:09Z

define i8 @src() {
entry:
    %p = alloca i8
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %p)
    store i8 1, i8* %p
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %p)
    %v = load i8, i8* %p
    call void @llvm.lifetime.end.p0i8(i64 1, i8* %p)
    ret i8 %v
}

define i8 @tgt() {
  %p = alloca i8, align 1
  call void @llvm.lifetime.start.p0i8(i64 1, i8* %p)
  store i8 1, i8* %p, align 1
  call void @llvm.lifetime.start.p0i8(i64 1, i8* %p)
  call void @llvm.lifetime.end.p0i8(i64 1, i8* %p)
  ret i8 undef
}

LangRef says A load from the pointer that precedes this intrinsic can be replaced with 'undef'.

The text was updated successfully, but these errors were encountered:

aqjune · 2020-06-21T10:59:54Z

ping @RalfJung

RalfJung · 2020-06-21T14:19:17Z

Via email you wrote

but the problem here is that we cannot determine which lifetime.start precedes another one in compile time. :/

My personal operational interpretation is that during execution, we keep track of which locals are alive and which are not. When marking a local as alive, if the local is already live, then this is UB (or at least the value of the local gets reset to undef). Would that explain the behavior you are seeing?

aqjune · 2020-12-12T06:06:49Z

Okay, to revisit this issue:

https://github.com/llvm/llvm-project/blob/eb44682d671d66e422b02595a636050582a4d84a/llvm/lib/CodeGen/StackColoring.cpp#L163 states this:

//   L2) on a `lifetime.start`, a stack slot is marked as *in-scope* and
//   the stack slot is overwritten with `undef`. << important

A relevant thread: rust-lang/rust#42371 (comment)

I think we can encode this precise semantics by resetting the corresponding local_block_val element. @nunoplopes What do you think?

aqjune · 2020-12-12T06:17:47Z

FWIW: Rust's MIR interpreter assumes that redundant StorageLive is UB, so refinement holds.

nunoplopes · 2020-12-12T15:38:05Z

Duplicated lifetime.start are not important. They never happen.
These intrinsics are meant to be used by the inliner, and are never added by users. llc doesn't seem to care about duplicated starts: https://godbolt.org/z/qbx8qK
Starts can show up multiple times automatically, but only if preceded by a lifetime.end. This happens, for example, when inlining a function call inside a loop.
With the evidence we have so far, I see two easy semantics:

Make lifetime.start memset the respective block to poison (I think this is what you suggested)
Trigger UB on start/end mistmatches

Setting it to poison seems the easiest. so I agree with you @aqjune. Just don't touch memory.cpp this month 😀

RalfJung · 2020-12-13T11:54:02Z

Duplicated lifetime.start are not important. They never happen.

They certainly used to happen in the LLVM IR that rustc generated. Is that a bug, i.e., is it UB to mark an already live local as live?

Make lifetime.start memset the respective block to poison (I think this is what you suggested)

This would mean that the memory stays at the same location, so pointers created to this local earlier will remain valid.

nunoplopes · 2020-12-13T12:19:24Z

Duplicated lifetime.start are not important. They never happen.

They certainly used to happen in the LLVM IR that rustc generated. Is that a bug, i.e., is it UB to mark an already live local as live?

I don't know if it's UB, but it's a bug for sure. Doesn't make sense to have 2 starts given that these start/end intrinsics are used for the stack allocator. They are used to compute the lifetime of an alloca, so I can't find a meaning for starting the lifetime twice (without ending it beforehand).

Make lifetime.start memset the respective block to poison (I think this is what you suggested)

This would mean that the memory stays at the same location, so pointers created to this local earlier will remain valid.

That's ok I guess. llc doesn't seem to do anything fancy with duplicated starts.

RalfJung · 2020-12-13T17:29:28Z

I don't know if it's UB, but it's a bug for sure.

If that is the consensus, it would be good to have that documented in the LangRef, and maybe have debug assertions catching this. For the people writing the Rust side of this, it was far from clear that this is considered buggy by LLVM.

aqjune · 2020-12-13T22:03:42Z

It seems a transformation may introduce unpaired lifetime.start and such one should be fixed as well.
This link (link) has:

; 'p = alloca' omitted
lifetime.start(p)
....
lifetime.end(p)

  => (optzns like loop versioning)

if (cond) {
  // optimized code, has no lifetime.start
} else {
  lifetime.start(p)
  ....
}
lifetime.end(p)

nunoplopes · 2020-12-13T22:22:08Z

The link you posted is about Polly. That isn't included in LLVM mainline. The replies in that thread also go in the direction of requiring paired lifetime start/end intrinsics.
So the previous conclusion still holds: triggering UB or memset poison are fine semantics. But memset poison seems easier, so I think we can go with that one.

RalfJung · 2020-12-14T09:53:20Z

Btw, rustc still emits redundant StorageEnd which translate to lifetime.end. I have not investigated the details as I cannot imagine this causing problems. What are your thoughts on that @nunoplopes?

nunoplopes · 2020-12-14T10:59:33Z

Btw, rustc still emits redundant StorageEnd which translate to lifetime.end. I have not investigated the details as I cannot imagine this causing problems. What are your thoughts on that @nunoplopes?

That should be fine. The first one wins. As long as there's no use after any StorageEnd.

aqjune · 2020-12-14T13:05:27Z

Okay.. I wrote a specification of lifetime.start/end. What do you think Nuno and Ralf?

for each lifetime.start S of alloca A, there should exist lifetime.end E of A s.t.

S dominates E /\ E postdominates S (there is no path that can visit only S or E)
Lifetimes of A cannot overlap: for any path from S to E, lifetime.start and lifetime.end of A other than S and E shouldn't exist.

Then, alloca A is live between the lifetime.start/end pairs. Accessing the alloca out of the lifetime is UB. lifetime.start initializes the bytes as undef.
If there is no lifetime.start/end of A in the function, A's lifetime starts from its declaration and ends when the function returns.

One additional concern is that it might be a good idea to syntactically restrict the pointer argument given to lifetime.start/end. It won't be meaningful if it was loaded from the memory, such as %p = load i8** %q; lifetime.start(%p).

nunoplopes · 2020-12-14T15:23:46Z

Two comments:

I would replace undef with poison.
Another way of ending lifetime is with function termination, e.g., a noreturn call, return, etc. There's no need for a lifetime.end beforehand (didn't check if the inliner inserts lifetime.end before all noreturn calls, but likely not)

Then the question is about multiple syntactic start/end pairs for the same alloca. I assume pointers may become invalid between these as in theory the allocator can change the alloca'ed object to a different address. Should we use lifetime.start as an allocation site and lifetime.end as free?
If start/end show up inside a loop and are executed multiple times, they are guaranteed to go to the same location I guess as memory allocation is done statically, so we may want to optimize that case.

The syntactic check makes sense. I hope LLVM's verifier is enforcing that. 😅

aqjune · 2020-12-14T16:08:56Z

Another way of ending lifetime is with function termination, e.g., a noreturn call, return, etc. There's no need for a lifetime.end beforehand (didn't check if the inliner inserts lifetime.end before all noreturn calls, but likely not)

Oh, I've never thought about this. I'll experiment with a few cases.

Should we use lifetime.start as an allocation site and lifetime.end as free?

I think this is tricky. We'd like to assume that ptrtoint can be freely hoisted/sinked across lifetime intrinsic (or change LLVM to forbid that?)

I would replace undef with poison.

I'd like to do so, but would people like this change to be accompanied with lifetime? It is about the value of uninitialized storage.

RalfJung · 2020-12-14T17:11:40Z

Should we use lifetime.start as an allocation site and lifetime.end as free?

That what I did in Miri.

If start/end show up inside a loop and are executed multiple times, they are guaranteed to go to the same location I guess as memory allocation is done statically, so we may want to optimize that case.

Do we want to guarantee that they all go to the same allocation? I'd expect not, so we cold still model each lifetime.start as an allocation site.

nunoplopes · 2020-12-14T18:40:58Z

Q1: Can lifetime end at a function terminator?
A: Yes; no need for a lifetime.end: https://gcc.godbolt.org/z/e7sdh1

Q2: When we have multiple start/end, can llc place them in different addresses?
A: Couldn't make llc do it after 10 minutes of tinkering. But we should allow this behavior to happen. llc's algorithm is just too conservative (wastes a lot of memory).

nunoplopes · 2020-12-14T18:47:03Z

I think I had this idea before but we didn't propose it:
If lifetime.start returned a pointer, then all accesses would have a value-flow dependence on it, so hoisting and etc would be trivially correct. (i.e., no hoisting past start could happen). For sinking past end, we could play the same trick and have a flow dependence between all ops (minus stores) to the end intrinsic.
There you go: a simple StackSSA representation 😅
Ok, that's too disruptive for now.

Regarding ptr2int, we have to be careful with that one. If we can trigget llc to allocate different start/end pairs in different addresses, then we cannot allow free movement for ptr2int, otherwise it can catch the address of another start/end pair.

aqjune · 2020-12-15T09:55:55Z

Yeah, I remember that you mentioned it with a notion of SSI.
I recently had a discussion with my colleague about whether a program language that expresses everything (side-effect + memory) as a value and linking them with use-def chain can be helpful for verification. It will be helpful for verifying only a snippet of code of interest from a large function. This might be an interesting topic, but didn't dwelve into further.

BTW, I found that gep can also be hoisted across lifetime.start. https://gcc.godbolt.org/z/Wo9dbo
So, at some moment, lifetime.start should be able to just work as memset(poison), but for further stack coloring algorithm support it should work as an allocation.
I think having a special construct that controls that would work, maybe. A new keyword at lifetime.start that controls its behavior? (e.g., 'lifetime with memset-only' or something like this?)

nunoplopes · 2020-12-15T11:00:20Z

Since duplicated start/end pairs are not produced by LLVM at least, let's ignore them and make the semantics stronger to enable free movement of geps, ptr2int, etc.
@aqjune can you submit a patch to Langref to make it explicit that start/end don't affect validity of pointers or smth like that? And make sure some ISel person reviews it as well to ensure that llc respects that condition now and in any planned change.

aqjune · 2020-12-15T11:12:59Z

Sure, I'll submit a patch for this! :)

nunoplopes · 2020-12-15T11:20:01Z

thank you! 😀

aqjune · 2020-12-16T07:55:57Z

https://reviews.llvm.org/D93376

aqjune · 2021-02-23T16:55:31Z

Okay, if https://reviews.llvm.org/D94002 is accepted, the GVN transformation can now be explained.

aqjune · 2021-03-04T14:47:52Z

I'll make a PR that implements the new lifetime semantics tomorrow.

nunoplopes added the memory Memory Model label Jan 19, 2021

aqjune mentioned this issue Mar 5, 2021

Fix lifetime encoding to match the new langref wording #686

Merged

nunoplopes closed this as completed in #686 Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GVN transformation regarding lifetime.start #433

GVN transformation regarding lifetime.start #433

aqjune commented Jun 21, 2020

aqjune commented Jun 21, 2020

RalfJung commented Jun 21, 2020

aqjune commented Dec 12, 2020 •

edited

Loading

aqjune commented Dec 12, 2020

nunoplopes commented Dec 12, 2020

RalfJung commented Dec 13, 2020 •

edited

Loading

nunoplopes commented Dec 13, 2020

RalfJung commented Dec 13, 2020

aqjune commented Dec 13, 2020

nunoplopes commented Dec 13, 2020

RalfJung commented Dec 14, 2020 •

edited

Loading

nunoplopes commented Dec 14, 2020

aqjune commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

aqjune commented Dec 14, 2020

RalfJung commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

aqjune commented Dec 15, 2020

nunoplopes commented Dec 15, 2020

aqjune commented Dec 15, 2020

nunoplopes commented Dec 15, 2020

aqjune commented Dec 16, 2020

aqjune commented Feb 23, 2021

aqjune commented Mar 4, 2021

GVN transformation regarding lifetime.start #433

GVN transformation regarding lifetime.start #433

Comments

aqjune commented Jun 21, 2020

aqjune commented Jun 21, 2020

RalfJung commented Jun 21, 2020

aqjune commented Dec 12, 2020 • edited Loading

aqjune commented Dec 12, 2020

nunoplopes commented Dec 12, 2020

RalfJung commented Dec 13, 2020 • edited Loading

nunoplopes commented Dec 13, 2020

RalfJung commented Dec 13, 2020

aqjune commented Dec 13, 2020

nunoplopes commented Dec 13, 2020

RalfJung commented Dec 14, 2020 • edited Loading

nunoplopes commented Dec 14, 2020

aqjune commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

aqjune commented Dec 14, 2020

RalfJung commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

nunoplopes commented Dec 14, 2020

aqjune commented Dec 15, 2020

nunoplopes commented Dec 15, 2020

aqjune commented Dec 15, 2020

nunoplopes commented Dec 15, 2020

aqjune commented Dec 16, 2020

aqjune commented Feb 23, 2021

aqjune commented Mar 4, 2021

aqjune commented Dec 12, 2020 •

edited

Loading

RalfJung commented Dec 13, 2020 •

edited

Loading

RalfJung commented Dec 14, 2020 •

edited

Loading