diff --git a/docs/source/background/abbreviations.rst b/docs/source/background/abbreviations.rst index 3fea304c..a6ba039d 100644 --- a/docs/source/background/abbreviations.rst +++ b/docs/source/background/abbreviations.rst @@ -76,6 +76,18 @@ MPI Message Passing Interface. An API standard defining functions and utilities useful for writing software using distributed parallelism. +.. _raii: + +**** +RAII +**** + +Resource acquisition is initialization. In C++ RAII has come to mean that +resources (such as memory, file handles, basically anything whose use needs to +be managed) should be tied to the lifetime of an object. This ensures that when +the object is deleted the resources are released, which in turn helps avoid +leaks. + .. _simd: **** diff --git a/docs/source/conf.py b/docs/source/conf.py index b1502c52..7043aacb 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -171,7 +171,7 @@ # -- Options for intersphinx extension --------------------------------------- # Example configuration for intersphinx: refer to the Python standard library. -intersphinx_mapping = {'https://docs.python.org/': None} +intersphinx_mapping = {'python': ('https://docs.python.org/3', None)} # -- Options for todo extension ---------------------------------------------- diff --git a/docs/source/developer/design/assets/runtime_view.png b/docs/source/developer/design/assets/runtime_view.png index c32e8e04..031d5698 100644 Binary files a/docs/source/developer/design/assets/runtime_view.png and b/docs/source/developer/design/assets/runtime_view.png differ diff --git a/docs/source/developer/design/runtime_view.rst b/docs/source/developer/design/runtime_view.rst index e4d52be7..16bc00a6 100644 --- a/docs/source/developer/design/runtime_view.rst +++ b/docs/source/developer/design/runtime_view.rst @@ -47,8 +47,12 @@ number one supercomputer in the world, or anything in between. - Hardware #. Multi-process operations need to go through ``RuntimeView``. -#. MPI compatability. +#. MPI compatibility. #. Flexibility of backend. +#. Setup/teardown of parallel resources + + - See :ref:`understanding_runtime_initialization_finalization` for more + details, but basically we need callbacks. ************************ RuntimeView Architecture @@ -72,11 +76,21 @@ addresses the above consideration by (numbering is from above): ``GPU`` objects in a particular ``ResourceSet``. - This facilitates selecting start/end points. -#. MADNESS is built on MPI. MPI is exposed through MADNESS. +#. MPI support happens via the ``CommPP`` class. + #. The use of the PIMPL design allows us to hide many of the backend types. It also facilitates writing an implementation for a different backend down the line (although the API would need to change too). +#. Storing of callbacks allows us to tie the lifetime of the ``RuntimeView`` to + the teardown of parallel resources, i.e., ``RuntimeView`` will automatically + finalize any parallel resources which depend on ``RuntimeView`` before + finalizing itself. + + - Note, finalization callbacks are stored in a stack to ensure a controlled + teardown order as is usually needed for libraries with initialize/finalize + functions. + Some finer points: - The scheduler is envisioned as taking task graphs and scheduling them in a @@ -92,7 +106,7 @@ Some finer points: Proposed APIs ************* -Examples of all-to-all communications +Examples of all-to-all communications: .. code-block:: c++ @@ -106,6 +120,21 @@ Examples of all-to-all communications // This is an all reduce auto output2 = rt.reduce(data, op); + +Example of tying another library's parallel runtime teardown to the lifetime of +a ``RuntimeView`` (note this is only relevant when ParallelZone starts MPI): + +.. code-block:: c++ + + // Create a RuntimeView object + RuntimeView rt; + + // Initialize the other library + other_library_initialize(); + + // Register the corresponding finalization routine with the RuntimeView + rt.stack_callback(other_library_finalize); + .. note:: As written the APIs assume the data is going to/from RAM. If we eventually diff --git a/docs/source/developer/index.rst b/docs/source/developer/index.rst index 9aa21d64..78c862b4 100644 --- a/docs/source/developer/index.rst +++ b/docs/source/developer/index.rst @@ -28,3 +28,4 @@ developers may also find the more general `NWChemEx Developer Documentation :caption: Contents: design/index + initialize_finalize diff --git a/docs/source/developer/initialize_finalize.rst b/docs/source/developer/initialize_finalize.rst new file mode 100644 index 00000000..bf9de2ad --- /dev/null +++ b/docs/source/developer/initialize_finalize.rst @@ -0,0 +1,224 @@ +.. Copyright 2024 NWChemEx-Project +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. + +.. _understanding_runtime_initialization_finalization: + +################################################# +Understanding Runtime Initialization/Finalization +################################################# + +:ref:`mpi` requires users to call ``MPI_Init`` to start MPI and +``MPI_Finalize`` to end it. MPI requires that each of these functions be called +only once, regardless of how many code units actually use MPI, i.e., managing +the lifetime of resources such as MPI processes and adhering to :ref:`raii` can +be tricky. This page works through some scenarios to help the reader become +better acquainted with the complexities. + +************ +RAII and MPI +************ + +ParallelZone opts to manage MPI through RAII. To do this we associate the +lifetime of MPI with the lifetime of a ``Runtime`` object. When a ``Runtime`` +object is created it is either initialized with an existing MPI communicator or +it initializes MPI and then uses the MPI communicator resulting from +initialization. Each ``Runtime`` object internally tracks whether it initialized +MPI or not. When a ``Runtime`` object is destructed it will only call +``MPI_Finalize`` if ``*this`` initialized MPI. + +.. note:: + + At present there is no user-accessible ``Runtime`` object, rather users + interact with an implicit ``Runtime`` through ``RuntimeView`` objects. When + all ``RuntimeView`` objects go out of scope the implicit ``Runtime`` object + is destructed. This decision stems from not wanting accidental/implicit + copies to inadvertently shut down MPI. + +.. _traditional_solution: + +******************** +Traditional Solution +******************** + +Many existing libraries deal with the MPI problem in one of two ways: + +1. Assume the user will manage MPI. Thus the library requires the user to + provide an already initialized MPI communicator. +2. Define functions like ``initialize`` / ``finalize`` which wrap MPI's + ``MPI_Init`` / ``MPI_Finalize`` functions respectively. + +From the perspective of PZ Scenario 1 is the easiest to deal with because it +means PZ is free to manage the lifetime of MPI however it wants, so long as MPI +is finalized after the library is done with it. Scenario 1 works well with our +RAII solution "out of the box" and is not considered further. + +Scenario 2 is much harder because we know the library's ``initialize`` and +``finalize`` functions will contain MPI functions. This is because they will +minimally contain ``MPI_Init`` and ``MPI_Finalize``, but the functions may also +check if MPI has been initialized and finalized (this is a common practice to +avoid accidentally calling ``MPI_Init``/``MPI_Finalize`` after MPI has already +been initialize/finalized). It is also conceivable that these functions do +additional initialization/finalization which requires MPI to be initialized, but +not yet finalized, e.g., calls to synchronize data. + +.. _raii_interacting_with_traditional_solution: + +****************************************** +RAII Interacting With Traditional Solution +****************************************** + +In :ref:`traditional_solution` we noted that when a library provides its own +``initialize`` / ``finalize`` functions (which we called "Scenario 2") RAII +interactions become more complicated. It's worth noting that Scenario 2 has two +sub-scenarios: + +a. User should only call ``initialize`` and ``finalize`` if the library is + managing MPI. +b. The user should always call ``initialize`` and ``finalize``. + +Each of these sub-scenarios can occur interact with ParallelZone in one of two +states: PZ started MPI or PZ did not start MPI. Sub-scenario a is essentially +the same as Scenario 1 in :ref:`traditional_solution` if ParallelZone starts +MPI. If, however, the library starts MPI we have: + +.. code-block:: c++ + + initialize(); // library starts MPI + auto comm = get_mpi_communicator_from_library(); + RuntimeView rv(comm); // PZ uses MPI from library + + finalize(); // library releases MPI + // end of code, rv is released + +This is fine so long as destruction of ``rv`` is guaranteed not to use any +MPI functions (which we ultimately will not be able to guarantee, but we'll get +to that). For now we note that there is a better way to write this which will +work even if ``rv`` calls MPI functions, namely we force ``rv`` to go out of +scope before ``finalize`` is called: + +.. code-block:: c++ + + + initialize(); // library starts MPI + { + auto comm = get_mpi_communicator_from_library(); + RuntimeView rv(comm); // PZ uses MPI from library + // rv is released + } + + finalize(); // library releases MPI + // end of code + +Moving on Scenario 2b, if the library starts MPI it is identical to when the +library starts MPI in Scenario 2a and no further comment is necessary. The +remaining condition is Scenario 2b with ParallelZone starting MPI: + +.. code-block:: c++ + + RuntimeView rv; // ParallelZone starts MPI + auto comm = rv.mpi_comm(); + initialize(comm); // library uses MPI from PZ + + finalize(); + // end of code, PZ releases MPI + +This is okay as long as ``rv`` is guaranteed to be in scope when ``finalize`` +is called. + +*********************** +RAII Plus Encapsulation +*********************** + +:ref:`raii_interacting_with_traditional_solution` showed that our RAII solution +is fine as long as we control the order of destruction. This is a detail we'd +rather not leak to the user, especially if more initialization/finalization +functions are added later (or if some are removed). With the traditional +solution we can easily encapsulate this detail with something like: + +.. code-block:: c++ + + void initialize() { + library_a::initialize(); + library_b::initialize(); + } + + void finalize() { + library_b::finalize(); + library_a::finalize(); + } + + // User's code + initialize(); // A initializes MPI, then B uses A's MPI + + finalize(); // B cleans up, then A finalizes MPI + +As shown, the order of initialization/finalization is guaranteed by creating +wrappers around sub-library initialization/finalization. Users rely on the +wrappers and never need to worry about the order. + +So now what about RAII? Let's start with Scenario 2b, and ParallelZone starting +MPI: + +.. code-block:: c++ + + RuntimeView initialize() { + RuntimeView rv; // PZ starts MPI + library_a::initialize(rv.mpi_comm()); + return rv; // Must keep rv alive + } + + void finalize() { + library_a::finalize(); + } + + // User's code + auto rv = initialize(); + + finalize(); // library finalizes + // end of code, PZ ends MPI + +While this works, it violates RAII because the user needs to remember to call +``finalize`` before the code ends or else there will be a resource leak. The +entire point of RAII is to avoid the possibility of leaks. If we want our +``RuntimeView`` to adhere to RAII we must find a way for the destructor of +``rv`` to call ``finalize`` before it stops MPI. The easiest way to do this is +with callbacks: + +.. code-block:: c++ + + RuntimeView initialize() { + RuntimeView rv; // PZ starts MPI + library_a::initialize(rv.mpi_comm()); + + // Register that rv must call finalize upon destruction + rv.stack_callback(library_a::finalize); + return rv; // Must keep rv alive + } + + // User's code + auto rv = initialize(); + + // end of code, PZ's dtor calls library_a::finalize() then ends MPI + + +******* +Summary +******* + +- MPI leaks initialization/finalization concerns to all dependencies. +- This has led to many libraries leaking those same details to their + dependencies too. +- When ParallelZone manages MPI we can use RAII to avoid leaking those details + to our dependencies. +- RAII however requires that ``RuntimeView`` be able to hold callbacks.