Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-determinism in Reassociate caused by address coincidences #6659

Closed
lizhengxing opened this issue May 30, 2024 · 0 comments · Fixed by #6666
Closed

Non-determinism in Reassociate caused by address coincidences #6659

lizhengxing opened this issue May 30, 2024 · 0 comments · Fixed by #6666
Assignees
Labels
bug Bug, regression, crash

Comments

@lizhengxing
Copy link
Collaborator

Description
The upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), was landed in DXC by this PR
#6598.

However, this change could cause non-determinism results due to address coincidences for some large shaders. It looks like that upstream guy ran into the same issue as well and here's the upstream fix (llvm/llvm-project@ef8761f) for this non-determinism bug.

summary of the upstream fix:
Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism.

Unfortunately, this upstream fix couldn't fix the non-determinism bug in DXC. Because the fix relies on the WeakVH and the WeakVH related code in DXC is out of date.

Steps to Reproduce
Compile the large failing hlsl code with the same command line 10 times. At least one time, the compile result will be different.

Actual Behavior
Compiled with the same command line, the hash value of the shader will be different.

Environment
DXC version: 1.8 - 1.8.0.14576 (main, 381750e)
Host Operating System: Windows 11

@lizhengxing lizhengxing added bug Bug, regression, crash needs-triage Awaiting triage labels May 30, 2024
lizhengxing added a commit that referenced this issue May 30, 2024
This PR pulls the upstream change, Use accessors for ValueHandleBase::V; NFC (llvm/llvm-project@6f08789), into DXC.

Here's the summary of the change:
  This changes code that touches ValueHandleBase::V to go through getValPtr and (newly added) setValPtr.  This functionality will be used later, but also seemed like a generally good cleanup.

  I also renamed the field to Val, but that's just to make it obvious that I fixed all the uses.

This is part 1 of the fix for #6659.
@lizhengxing lizhengxing self-assigned this May 30, 2024
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (llvm/llvm-project@8a62382), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC (llvm/llvm-project@e6bca0e), into DXC.

Here's the summary of the change:

  I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit.

  Reviewers: dblaikie, davide

  Reviewed By: davide

  Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

  Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, UAdd a new WeakVH value handle; NFC (llvm/llvm-project@f1c0eaf), into DXC.

Here's the summary of the change:

  WeakVH nulls itself out if the value it was tracking gets deleted, but it does not track RAUW.

  Reviewers: dblaikie, davide

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32267

This is part 4 of the fix for #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, Use a 2 bit pointer in ValueHandleBase::PrevPair; NFC (llvm/llvm-project@b297bff), into DXC.

Here's the summary of the change:

  This was an omission in r301813.  I had made the supporting changes to make this happen, but I forgot to actually update the PrevPair declaration.

  llvm-svn: 301817

This is part 5 of the fix for #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, Fix non-determinism in Reassociate caused by address coincidences (llvm/llvm-project@ef8761f), into DXC.

Here's the summary of the change:

  Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a
  just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, [llc/opt] Add an option to run all passes twice (llvm/llvm-project@04464cf), into DXC.

Here's the summary of the change:

  Lately, I have submitted a number of patches to fix bugs that only occurred when using the same pass manager to compile multiple modules (generally these bugs are failure to reset some persistent state).

  Unfortunately I don't think there is currently a way to test that from the command line. This adds a very simple flag to both llc and opt, under which the tools will simply re-run their respective
  pass pipelines using the same pass manager on (a clone of the same module). Additionally, we verify that both outputs are bitwise the same.

  Reviewers: yaron.keren

This is for the test of this change (llvm/llvm-project@ef8761f) to fix #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, [llc/opt] Add an option to run all passes twice (llvm/llvm-project@04464cf), into DXC.

Here's the summary of the change:

  Lately, I have submitted a number of patches to fix bugs that only occurred when using the same pass manager to compile multiple modules (generally these bugs are failure to reset some persistent state).

  Unfortunately I don't think there is currently a way to test that from the command line. This adds a very simple flag to both llc and opt, under which the tools will simply re-run their respective
  pass pipelines using the same pass manager on (a clone of the same module). Additionally, we verify that both outputs are bitwise the same.

  Reviewers: yaron.keren

This is for the test of this change (llvm/llvm-project@ef8761f) to fix #6659.
lizhengxing added a commit that referenced this issue May 31, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (llvm/llvm-project@8a62382), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
lizhengxing added a commit that referenced this issue Jun 4, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (llvm/llvm-project@8a62382), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
lizhengxing added a commit that referenced this issue Jun 4, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (llvm/llvm-project@8a62382), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
lizhengxing added a commit that referenced this issue Jun 6, 2024
This PR pulls the upstream change, Use accessors for ValueHandleBase::V;
NFC
(llvm/llvm-project@6f08789),
into DXC.

Here's the summary of the change:

> This changes code that touches ValueHandleBase::V to go through
getValPtr and (newly added) setValPtr. This functionality will be used
later, but also seemed like a generally good cleanup.
> 
> I also renamed the field to Val, but that's just to make it obvious
that I fixed all the uses.


This is part 1 of the fix for #6659.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
lizhengxing added a commit that referenced this issue Jun 6, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (llvm/llvm-project@8a62382), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
@llvm-beanz llvm-beanz added this to the Backlog milestone Jun 10, 2024
@llvm-beanz llvm-beanz removed the needs-triage Awaiting triage label Jun 10, 2024
@llvm-beanz llvm-beanz moved this to Triaged in HLSL Triage Jun 10, 2024
@llvm-beanz llvm-beanz modified the milestones: Backlog, Next 2024 Release Jun 10, 2024
lizhengxing added a commit that referenced this issue Jun 12, 2024
This PR pulls the upstream change, Emulate TrackingVH using WeakVH
(llvm/llvm-project@8a62382),
into DXC.

Here's the summary of the change:

> This frees up one slot in the HandleBaseKind enum, which I will use
later to add a new kind of value handle. The size of the HandleBaseKind
enum is important because we store a HandleBaseKind in
>   the low two bits of a (in the worst case) 4 byte aligned pointer.
> 
>   Reviewers: davide, chandlerc
> 
>   Subscribers: mcrosier, llvm-commits
> 
>   Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
lizhengxing added a commit that referenced this issue Jun 12, 2024
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC (llvm/llvm-project@e6bca0e), into DXC.

Here's the summary of the change:

>  I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit.
>
>  Reviewers: dblaikie, davide
>
>  Reviewed By: davide
>
>  Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
>
>  Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.
lizhengxing added a commit that referenced this issue Jun 12, 2024
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC (llvm/llvm-project@e6bca0e), into DXC.

Here's the summary of the change:

>  I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit.
>
>  Reviewers: dblaikie, davide
>
>  Reviewed By: davide
>
>  Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
>
>  Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.
@damyanp damyanp assigned adam-yang and unassigned lizhengxing Jun 17, 2024
adam-yang pushed a commit that referenced this issue Jun 18, 2024
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC
(llvm/llvm-project@e6bca0e),
into DXC.

Here's the summary of the change:

> I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
> 
>   Reviewers: dblaikie, davide
> 
>   Reviewed By: davide
> 
> Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb,
llvm-commits, nhaehnle
> 
>   Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
adam-yang added a commit that referenced this issue Jun 21, 2024
Originally @lizhengxing's PR. Retargeting main.

This PR pulls 2 upstream changes, Add a new WeakVH value handle; NFC
(llvm/llvm-project@f1c0eaf)
and Use a 2 bit pointer in ValueHandleBase::PrevPair; NFC
(llvm/llvm-project@b297bff),
into DXC.

Here's the summary of the changes:

Add a new WeakVH value handle; NFC 
> WeakVH nulls itself out if the value it was tracking gets deleted, but
it does not track RAUW.
> 
>       Reviewers: dblaikie, davide
> 
>       Subscribers: mcrosier, llvm-commits
> 
>       Differential Revision: https://reviews.llvm.org/D32267

Use a 2 bit pointer in ValueHandleBase::PrevPair; NFC

> This was an omission in r301813. I had made the supporting changes to
make this happen, but I forgot to actually update the
> 
> PrevPair declaration.

This is part 4 and 5 of the fix for #6659.
adam-yang pushed a commit to adam-yang/DirectXShaderCompiler that referenced this issue Jun 21, 2024
This PR pulls the upstream change, Fix non-determinism in Reassociate caused by address coincidences (llvm/llvm-project@ef8761f), into DXC.

Here's the summary of the change:

  Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a
  just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for microsoft#6659.
adam-yang pushed a commit that referenced this issue Jun 21, 2024
This PR pulls the upstream change, [llc/opt] Add an option to run all passes twice (llvm/llvm-project@04464cf), into DXC.

Here's the summary of the change:

  Lately, I have submitted a number of patches to fix bugs that only occurred when using the same pass manager to compile multiple modules (generally these bugs are failure to reset some persistent state).

  Unfortunately I don't think there is currently a way to test that from the command line. This adds a very simple flag to both llc and opt, under which the tools will simply re-run their respective
  pass pipelines using the same pass manager on (a clone of the same module). Additionally, we verify that both outputs are bitwise the same.

  Reviewers: yaron.keren

This is for the test of this change (llvm/llvm-project@ef8761f) to fix #6659.
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap Jun 24, 2024
adam-yang pushed a commit to adam-yang/DirectXShaderCompiler that referenced this issue Jun 24, 2024
This PR pulls the upstream change, Fix non-determinism in Reassociate caused by address coincidences (llvm/llvm-project@ef8761f), into DXC.

Here's the summary of the change:

  Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a
  just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for microsoft#6659.
adam-yang added a commit that referenced this issue Jun 25, 2024
)

Originally @lizhengxing's PR.  Retargeting main.

This PR pulls the upstream change, Fix non-determinism in Reassociate
caused by address coincidences
(llvm/llvm-project@ef8761f),
into DXC.

Here's the summary of the change:

Between building the pair map and querying it there are a few places
that erase and create Values. It's rare but the address of these newly
created Values is occasionally the same as a
just-erased Value that we already have in the pair map. These
coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for #6659.

Co-authored-by: Zhengxing Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug, regression, crash
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants