Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

886 - Add missing MPI-allowed guard in recvDataMsgBuffer #899

Merged
merged 1 commit into from
Jul 1, 2020

Conversation

pnstickne
Copy link
Contributor

  • Iprobe and Test were not guarded..

    mpirun -n 8 ./examples/collection/lb_iter --vt_lb --vt_lb_name=GossipLB

@pnstickne pnstickne requested a review from lifflander June 27, 2020 19:15
@pnstickne
Copy link
Contributor Author

@lifflander
While this fixes the PMPI errors, there appears an issue with GossipLB on develop. Even with the guards DISABLED,

t: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2] ------------------------------------------- Runtime Error: System Aborting! --------------------------------------------
vt: [2] ------------------------------------------------ Fatal Error on Node 2 -------------------------------------------------
vt: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2]
vt: [2]              Reason: migrateObjectTo should be called between startMigrationCollective and finishMigrationCollective
vt: [2]    Assertion failed: (during_migration_)
vt: [2]                Node: 2
vt: [2]           Num Nodes: 4
vt: [2]                File: /Users/pnstick/code/vt/src/vt/vrt/collection/balance/baselb/baselb.cc
vt: [2]                Line: 237
vt: [2]            Function: migrateObjectTo
vt: [2]                Code: 1
vt: [2]           Build SHA: 4f1d18fe777633f7061b3b2df87fb0900456b48a
vt: [2]           Build Ref: refs/heads/886-gosslb-iprobe
vt: [2]         Description: heads/886-gosslb-iprobe-0-g4f1d18fe77
vt: [2]            GIT Repo: clean
vt: [2]            Hostname: s1044664ca
vt: [2]
vt: [0] ------------------------------------------------------------------------------------------------------------------------
vt: [0] -------------------------------------------- Dump Stack Backtrace on Node 0 --------------------------------------------
vt: [0] ------------------------------------------------------------------------------------------------------------------------
vt: [0] 0   18  0x10b2c3c25   vt::debug::stack::dumpStack(int) + 85
vt: [0] 1   18  0x10ba9dee7   vt::runtime::Runtime::output(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, bool, bool, bool) + 2039
vt: [0] 2   18  0x10b3d7d02   vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, bool, bool, bool, bool) + 274
vt: [0] 3   18  0x10b3b3b80   vt::output(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, bool, bool, bool, bool) + 128
vt: [0] 4   18  0x10ae537b8   std::__1::enable_if<(std::tuple_size<std::__1::tuple<> >::value) == (0), void>::type vt::debug::assert::assertOut<>(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::tuple<>&&) + 424
vt: [0] 5   18  0x10be1fd80   vt::vrt::collection::lb::BaseLB::migrateObjectTo(unsigned long long, short) + 864
vt: [0] 6   18  0x10be1f9be   vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >*) + 1550
vt: [0] 7   18  0x10c3df411   vt::objgroup::dispatch::Dispatch<vt::vrt::collection::lb::BaseLB>::run(long long, vt::messaging::BaseMsg*) + 1217
vt: [0] 8   18  0x10c461bf3   vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long long) + 1843
vt: [0] 9   18  0x10c467247   vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long long) + 119
vt: [0] 10  18  0x10be775e1   vt::runnable::Runnable<vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > > >::runObj(long long, vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >*, short) + 1329
vt: [0] 11  18  0x10be71e2a   vt::runnable::Runnable<vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > > >::run(long long, void (*)(vt::messaging::BaseMsg*), vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >*, short, int) + 122
vt: [0] 12  18  0x10be71c47   void vt::serialization::SerializedMessenger::payloadMsgHandler<vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >, vt::messaging::ActiveMsg<vt::messaging::EpochTagActiveEnvelope> >(vt::serialization::SerialPayloadMsg<vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >, vt::serialization::SerializedDataMsgAny<vt::vrt::collection::lb::TransferMsg<std::__1::vector<std::__1::tuple<unsigned long long, short>, std::__1::allocator<std::__1::tuple<unsigned long long, short> > > >, vt::messaging::ActiveMsg<vt::messaging::EpochTagActiveEnvelope> >, 128ll>*) + 471
vt: [0] 13  18  0x10afcc5a5   vt::runnable::Runnable<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >::run(long long, void (*)(vt::messaging::BaseMsg*), vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope>*, short, int) + 1829
vt: [0] 14  18  0x10c67d0d1   vt::messaging::ActiveMessenger::deliverActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, bool, std::__1::function<void ()>) + 1585
vt: [0] 15  18  0x10c67c96c   vt::messaging::ActiveMessenger::processActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, int const&, bool, std::__1::function<void ()>) + 540
vt: [0] 16  18  0x10c6d6760   vt::messaging::ActiveMessenger::scheduleActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, int const&, bool, std::__1::function<void ()>)::$_2::operator()() const + 144

@codecov
Copy link

codecov bot commented Jun 27, 2020

Codecov Report

Merging #899 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #899   +/-   ##
========================================
  Coverage    80.66%   80.66%           
========================================
  Files          355      355           
  Lines        11184    11184           
========================================
  Hits          9022     9022           
  Misses        2162     2162           

@lifflander
Copy link
Collaborator

To get the git --check to pass, rebase on develop. Those problems are fixed.

@lifflander
Copy link
Collaborator

@lifflander
While this fixes the PMPI errors, there appears an issue with GossipLB on develop. Even with the guards DISABLED,

Yes, that other bug will be fixed with another PR in the pipeline.

- Iprobe and Test were not guarded..

  `mpirun -n 8  ./examples/collection/lb_iter --vt_lb --vt_lb_name=GossipLB`
@pnstickne pnstickne force-pushed the 886-gosslb-iprobe branch from 4f1d18f to 7012c6c Compare June 30, 2020 07:36
@pnstickne pnstickne changed the title #886 add missing MPI-allowed guard in recvDataMsgBuffer 886 - Add missing MPI-allowed guard in recvDataMsgBuffer Jun 30, 2020
@lifflander lifflander merged commit 8549446 into develop Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants