510 - Assert-fail if MPI functions are accessed inside scheduled callbacks from within VT #792

pnstickne · 2020-04-28T08:13:01Z

When running in a non-release build, MPI access guard will be enabled by default.

These access guards utilize PMPI (ie. exposing MPI symbols first from VT) to intercept MPI calls.

If enabled, a call to [almost] any MPI_* function from a scheduler-executed handler will trigger a vtAssert. (Various MPI calls, such as WTime, Test, Probe, Get_count are excluded from checks for simplification of usages and benign or non-solo nature.)

Code inside VT explicitly grants itself scoped access to MPI functions via RAII hidden behind a conditional macro.

As an alternative/extension to the scope access, could also expose internal vt_MPI_xyz wrappers that would bypass (or disable the guard internally) and use those throughout VT. Would be trivial macro to MPI_xyz when disabled. It also might be possible to do similar with const auto vt_MPI_xyz = MPI_xyz; and rely on an optimizing compiler to elide the proxy call in release build (not guarded) cases.

Fixes #510

cmake/define_build_types.cmake

src/vt/pmpi/mpi_functions.h.in

src/vt/runtime/mpi_access.h

src/vt/runtime/runtime.cc

src/vt/scheduler/scheduler.cc

codecov · 2020-04-28T09:19:18Z

Codecov Report

Merging #792 into develop will increase coverage by 0.04%.
The diff coverage is 96.42%.

@@             Coverage Diff             @@
##           develop     #792      +/-   ##
===========================================
+ Coverage    80.02%   80.07%   +0.04%     
===========================================
  Files          342      343       +1     
  Lines        10696    10724      +28     
===========================================
+ Hits          8560     8587      +27     
- Misses        2136     2137       +1

Impacted Files	Coverage Δ
tests/unit/runtime/test_mpi_access_guards.cc	`96.42% <96.42%> (ø)`

src/vt/event/event_record.cc

src/vt/group/collective/group_info_collective.cc

src/vt/messaging/active.cc

src/vt/messaging/irecv_holder.h

PhilMiller · 2020-04-28T20:42:28Z

Looks like this is headed in the right direction. Could we get a negative test added to ensure that the guards actually catch an undesired call?

pnstickne · 2020-05-12T02:55:26Z

Looks like this is headed in the right direction. Could we get a negative test added to ensure that the guards actually catch an undesired call?

I've been thinking about how to handle such - there are several things that should test 'a vtAssert' occurred, although this is uhh, hard to catch in a test. It would also have to handle the process dying in gtest as state is all dead.

Guess that's 'ASSERT_DEATH' :|

src/vt/configs/error/config_assert.h

src/vt/runtime/mpi_access.cc

tests/unit/runtime/test_mpi_access_guards.cc

src/vt/group/collective/group_info_collective.cc

PhilMiller · 2020-05-30T22:02:14Z

I've rebased this and integrated/merged it with the PR that uses the safe MPI collectives.

It looks like you lost some of Paul's changes - there are things he's commented as done, that are back to what I'd remarked on as needing to change

- The casing of the value can differ between environments. (Despite documentation claims of allowed values.)

- All symbols now included to prevent breaking CMake/gtest regex scanning. - Using MPI_Test for more reliable failure detection. - Includes when access is explicitly granted in positive case.

- Excluded from generation by request. Also added MPI_Wticks for parity with MPI_Wtime. - Guard exclusions can be specified via regular expression patterns.

tests/unit/collectives/test_mpi_collective.cc

- Not needed, got re-included. The code in these handlers is expected to run in a context where MPI is used.

src/vt/pmpi/generate_mpi_wrappers.pl

lifflander · 2020-06-01T23:15:45Z

I've generated and pushed a full list of 3.1 that includes the missing calls from before.

src/CMakeLists.txt

lifflander · 2020-06-02T03:28:53Z

Here is an overview of what got changed by this pull request:

Clones added
============
- src/vt/messaging/active.cc  2

See the complete overview on Codacy

pnstickne force-pushed the 510-mpi-intercepts branch 3 times, most recently from 518333a to 85c15aa Compare April 28, 2020 08:40

pnstickne changed the title ~~510 - Assert-fail is MPI functions are accessed inside "user code" from within VT~~ 510 Assert-fail if MPI functions are accessed inside "user code" from within VT Apr 28, 2020