Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate collection test failure due to DS epochs #1788

Closed
lifflander opened this issue May 9, 2022 · 0 comments · Fixed by #1789
Closed

Migrate collection test failure due to DS epochs #1788

lifflander opened this issue May 9, 2022 · 0 comments · Fixed by #1789
Assignees

Comments

@lifflander
Copy link
Collaborator

Describe the bug
Requires 16 ranks.

5: idx=idx(4): val=117.2
6: idx=idx(5): val=146.5
7: idx=idx(6): val=175.8
8: idx=idx(7): val=205.1
9: idx=idx(8): val=234.4
10: idx=idx(9): val=263.7
11: idx=idx(10): val=293
12: idx=idx(11): val=322.3
13: idx=idx(12): val=351.6
15: migrateToNext: idx=idx(15)
15: idx=idx(15): val=1.897e-321
vt: Caught SIGSEGV signal: 11 
vt: [15] ------------------------------------------------------------------------------------------------------------------------
vt: [15] ------------------------------------------- Dump Stack Backtrace on Node 15 --------------------------------------------
vt: [15] ------------------------------------------------------------------------------------------------------------------------
vt: [15] 0   18  0xfff688 vt::debug::stack::dumpStack[abi:cxx11](int) + 40
vt: [15] 1   18  0xa2db6f vt::runtime::Runtime::handleSignalFailure() + 111
vt: [15] 2   18  0xa0ec50 vt::runtime::Runtime::sigHandler(int) + 1456
vt: [15] 3   18  0x7ffff642b400 killpg + 64
vt: [15] 4   18  0x9127aa vt::elm::ElementLBData::addTime(vt::TimeTypeWrapper const&) + 202
vt: [15] 5   18  0x912abf vt::elm::ElementLBData::stopTime() + 47
vt: [15] 6   18  0xfd4c2a vt::runnable::RunnableNew::end() + 42
vt: [15] 7   18  0xfd2923 vt::runnable::RunnableNew::run() + 1427
vt: [15] 8   18  0x14fb03c vt::sched::BaseUnit::execute() + 28
vt: [15] 9   18  0xff98bd vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 1181
vt: [15] 10  18  0xff6542 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 754
vt: [15] 11  18  0xff61da vt::sched::Scheduler::runSchedulerWhile(std::function<bool ()>) + 106
vt: [15] 12  18  0xa096bf vt::runtime::Runtime::~Runtime() + 111
vt: [15] 13  18  0xa0a69a vt::runtime::Runtime::~Runtime() + 10
vt: [15] 14  18  0x90dc95 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::finalize(vt::runtime::RuntimeHolder) + 37
vt: [15] 15  18  0x90ebfa vt::finalize() + 42
vt: [15] 16  18  0x7e7f0b main + 1835
vt: [15] 17  18  0x7ffff6417555 __libc_start_main + 245
vt: [15] 18  18  0x7e7729 _start + 41
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
  Process name: [[5294,1],15]
  Exit code:    1
--------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant