Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1507 context td #1508

Merged
merged 7 commits into from
Jul 22, 2021
Merged

1507 context td #1508

merged 7 commits into from
Jul 22, 2021

Conversation

PhilMiller
Copy link
Member

@PhilMiller PhilMiller commented Jul 21, 2021

Fixes #1507

@PhilMiller
Copy link
Member Author

Tests start breaking at "Reset epoch_stack_size_ in TD::resume() in case underlying stack ..."

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (intel 18.03, ubuntu, mpich)

Build for 571e5e6

Build failed for unknown reason. Check build logs


Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (intel 19, ubuntu, mpich)

Build for 571e5e6

Build failed for unknown reason. Check build logs


Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (clang-5.0, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (clang-3.9, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-5, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-6, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (clang-10, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@codecov
Copy link

codecov bot commented Jul 21, 2021

Codecov Report

Merging #1508 (571e5e6) into develop (896200a) will decrease coverage by 0.02%.
The diff coverage is 37.50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1508      +/-   ##
===========================================
- Coverage    82.74%   82.72%   -0.03%     
===========================================
  Files          780      780              
  Lines        29216    29219       +3     
===========================================
- Hits         24174    24170       -4     
- Misses        5042     5049       +7     
Impacted Files Coverage Δ
src/vt/context/runnable_context/td.cc 53.65% <37.50%> (-6.87%) ⬇️
src/vt/messaging/request_holder.h 77.14% <0.00%> (-5.72%) ⬇️
src/vt/event/event.cc 59.60% <0.00%> (-0.67%) ⬇️

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (clang-9, ubuntu, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented Jul 21, 2021

PR tests (clang-10, alpine, mpich)

Build for 571e5e6

Compilation - successful

Testing - passed

Build log

@PhilMiller PhilMiller marked this pull request as ready for review July 22, 2021 00:04
@PhilMiller PhilMiller force-pushed the 1507-context-td branch 2 times, most recently from 21cb042 to c7b3a7a Compare July 22, 2021 00:07
Copy link
Collaborator

@lifflander lifflander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is it possible to add a test for this that exercises the bug?

@PhilMiller
Copy link
Member Author

Looks good. Is it possible to add a test for this that exercises the bug?

Probably.

I don't really want to spend more time on this, since it would mess would my commitments to Kokkos and non-SNL projects

Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. My run succeeded this time.

@nlslatt
Copy link
Collaborator

nlslatt commented Jul 22, 2021

Looks good. Is it possible to add a test for this that exercises the bug?

Probably.

I don't really want to spend more time on this, since it would mess would my commitments to Kokkos and non-SNL projects

Is it clear to you exactly what happened to trigger the bug? Given how far into my run it got before dying, I'm not sure how easily reproducible it is.

@PhilMiller
Copy link
Member Author

Looks good. Is it possible to add a test for this that exercises the bug?

Probably.
I don't really want to spend more time on this, since it would mess would my commitments to Kokkos and non-SNL projects

Is it clear to you exactly what happened to trigger the bug? Given how far into my run it got before dying, I'm not sure how easily reproducible it is.

My suspicion is that it was the first time a thread had been resumed from idle and then ran to completion, thus giving it a slightly different epoch stack. I don't have confirmation for this, though.

I got to the changes made in this PR my inspection and deduction, so I don't have a good empirical point of reference.

@PhilMiller
Copy link
Member Author

I just rebased to update, and made a slight syntax tweak, so that the two popping while loops would make the same comparison

@PhilMiller PhilMiller merged commit 18e615e into develop Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in TD context with thread suspend/resume
3 participants