Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure offline drivers #469

Merged
merged 2 commits into from
Nov 19, 2024
Merged

Restructure offline drivers #469

merged 2 commits into from
Nov 19, 2024

Conversation

micaeljtoliveira
Copy link
Contributor

@micaeljtoliveira micaeljtoliveira commented Nov 7, 2024

CABLE

Description

Restructure offline drivers such that there is only one executable being build, regardless of MPI support being available or not. Also add the option to run with only one MPI process when MPI support is available (this falls back to the serial driver).

To achieve this, a new mpi_grt_t class is introduced, that hides several calls to the MPI library and takes care of providing the necessary infrastructure when MPI support is not available.

Type of change

Please delete options that are not relevant.

  • Code refactor

Checklist

  • The new content is accessible and located in the appropriate section.
  • I have checked that links are valid and point to the intended content.
  • I have checked my code/text and corrected any misspellings

📚 Documentation preview 📚: https://cable--469.org.readthedocs.build/en/469/

@SeanBryan51 SeanBryan51 self-requested a review November 8, 2024 00:33
@micaeljtoliveira micaeljtoliveira force-pushed the introduce_mpi_group branch 3 times, most recently from 0208a48 to b2e57a7 Compare November 11, 2024 04:56
@micaeljtoliveira
Copy link
Contributor Author

@SeanBryan51 I've now changed the git history and one of the filenames as per our discussion today. Have a look at the changes and let me know if it's better now.

@SeanBryan51
Copy link
Collaborator

Thanks @micaeljtoliveira. The rendered diff looks a lot better now

Copy link
Collaborator

@SeanBryan51 SeanBryan51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @micaeljtoliveira - cool to see Fortran OOP features in action for the MPI data types! I'm happy with all the functional changes you have put in here.

I've been a bit ruthless in my review on documenting things with Ford but I think its good to set a standard for everyone to make sure things are documented properly.

src/offline/cable_driver_init.F90 Outdated Show resolved Hide resolved
src/offline/cable_mpi.F90 Outdated Show resolved Hide resolved
src/offline/cable_mpi.F90 Outdated Show resolved Hide resolved
src/offline/cable_mpi.F90 Show resolved Hide resolved
src/offline/cable_mpi.F90 Outdated Show resolved Hide resolved
src/offline/cable_offline_driver.F90 Outdated Show resolved Hide resolved
src/offline/cable_serial.F90 Show resolved Hide resolved
src/offline/cable_serial.F90 Show resolved Hide resolved
src/offline/cable_serial.F90 Show resolved Hide resolved
Comment on lines 162 to 182
if(CABLE_MPI)
add_executable(
cable-mpi
src/offline/cable_driver_init.F90
src/offline/cable_mpicommon.F90
src/offline/cable_mpidrv.F90
src/offline/cable_mpimaster.F90
src/offline/cable_mpiworker.F90
src/science/pop/pop_mpi.F90
"$<TARGET_OBJECTS:cable_common_objlib>"
src/offline/cable_offline_driver.F90
)
target_compile_definitions(cable-mpi PRIVATE __MPI__)
target_link_libraries(cable-mpi PRIVATE PkgConfig::NETCDF MPI::MPI_Fortran)
target_link_libraries(cable-mpi PRIVATE cable_common MPI::MPI_Fortran)
install(TARGETS cable-mpi RUNTIME)
else()
add_executable(
cable
src/offline/cable_mpimaster_stub.F90
src/offline/cable_mpiworker_stub.F90
src/offline/cable_offline_driver.F90
)
target_link_libraries(cable PRIVATE cable_common)
install(TARGETS cable RUNTIME)
endif()
Copy link
Collaborator

@SeanBryan51 SeanBryan51 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this will break benchcab for the case both cable and cable-mpi executables are needed as benchcab will only invoke cmake once to build both executables. However benchcab will still work with these changes for serial only (benchcab fluxsite) and mpi only (benchcab spatial) tests (FYI @ccarouge).

It would be good to think about the proper CMake workflow for building different variants of the cable executable (e.g. MPI on/off, Debug, Release, compiler). From reading online, it seems the typical approach for doing this is to generate separate build trees for each variant unless we work with a multiconfig build system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the typical approach for doing this is to generate separate build trees for each variant

Yes, I think that's the usual strategy. CABLE is quite fast to compile, so this should not be an issue.

One thing you might want to explore to make this process easier is the use of CMake presets. These are really nice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #485

- routines to initialise and finalise MPI
- a simple class to handle MPI groups.

All the contents of the module can handle the case where no MPI support is available and/or serial runs (i.e., use of only one rank), thus removing the need to fence some operations with preprocessing directives outside of this module.

To get CMake to properly propagate the link dependencies, I had to convert the common object library into a static library.
@micaeljtoliveira micaeljtoliveira force-pushed the introduce_mpi_group branch 2 times, most recently from 1249c60 to e6b19d9 Compare November 12, 2024 22:04
@micaeljtoliveira
Copy link
Contributor Author

I've been a bit ruthless in my review on documenting things with Ford but I think its good to set a standard for everyone to make sure things are documented properly.

No worries. I think I've now addressed all your comments. Please have a look and let me know if I missed something.

Copy link
Member

@ccarouge ccarouge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a question about the unified approach. What happens if someone launches the serial executable via mpirun using several processors by mistake? Wasted resources since it will run on 1 proc but "charge" the resources of several?

src/coupled/ACCESS-CM2/cable_cbm.F90 Outdated Show resolved Hide resolved
@SeanBryan51
Copy link
Collaborator

SeanBryan51 commented Nov 15, 2024

What happens if someone launches the serial executable via mpirun using several processors by mistake?

Testing the following:

module add intel-mpi/2019.5.281
mpiexec -n 4 ./cable

Looks like it invokes the serial executable in 4 processors in parallel.

@SeanBryan51
Copy link
Collaborator

SeanBryan51 commented Nov 15, 2024

Looks like it invokes the serial executable in 4 processors in parallel.

I've tested this with openmpi/4.1.4 and the behaviour is the same. This seems like a good argument to have the MPI executable named differently from the serial executable to prevent anyone from making this mistake.

There is now only one executable being built: either cable-mpi or cable, depending if MPI support is available or not. When MPI support is available, the code will now automatically run the serial driver if using only one MPI process, instead of stopping with an error like before.
@micaeljtoliveira
Copy link
Contributor Author

I just have a question about the unified approach. What happens if someone launches the serial executable via mpirun using several processors by mistake? Wasted resources since it will run on 1 proc but "charge" the resources of several?

At the moment this does not really change the current behavior, as there are still two executables. But it might be a source of confusion if later on there's only one executable. That said, from personal experience, it's usually quite obvious when one launches a non-MPI exec with MPI, as the output is duplicated for each MPI rank, as @SeanBryan51 just confirmed.

@SeanBryan51
Copy link
Collaborator

I've tested these changes with benchcab fluxsite and benchcab spatial so these changes are good to go 🚀 :

PBS log file:

2024-11-15 15:10:18,283 - INFO - benchcab.benchcab.py:369 - Running comparison tasks...
2024-11-15 15:10:18,307 - INFO - benchcab.benchcab.py:370 - tasks: 168 (models: 2, sites: 42, science configurations: 4)
2024-11-15 15:13:09,010 - INFO - benchcab.benchcab.py:381 - 0 failed, 168 passed

Benchcab version: 4.1.0

config.yaml:

realisations:
  - repo:
      git:
        branch: main
  - repo:
      git:
        branch: introduce_mpi_group

modules: [
  intel-compiler/2021.1.1,
  netcdf/4.7.4,
  openmpi/4.1.0
]

Spatial (MPI) runs all execute as expected (note outputs do not reproduce as per issue #463).

@micaeljtoliveira micaeljtoliveira merged commit 03fb810 into main Nov 19, 2024
7 checks passed
@micaeljtoliveira micaeljtoliveira deleted the introduce_mpi_group branch November 19, 2024 23:21
SeanBryan51 added a commit that referenced this pull request Nov 25, 2024
This change further consolidates the initialisation logic within the
offline drivers to remove code duplication and improve code clarity
across each of the drivers (see #425, #440 and #469).

As part of the refactoring changes, new optional namelist options
(`filename%new_sumbal` and `filename%trunk_sumbal`) are introduced to
replace hard coded filenames and optional namelist option `CASAonly` is
restored for consistency with the documentation.

## Type of change

- [x] Code refactor
- [x] New feature
- [x] New or updated documentation

## Checklist

- [x] The new content is accessible and located in the appropriate
section.
- [x] I have checked that links are valid and point to the intended
content.
- [x] I have checked my code/text and corrected any misspellings

<!-- readthedocs-preview cable start -->
----
📚 Documentation preview 📚:
https://cable--472.org.readthedocs.build/en/472/

<!-- readthedocs-preview cable end -->
SeanBryan51 added a commit that referenced this pull request Nov 29, 2024
…#498)

This change further consolidates the initialisation logic within the
offline drivers (in particular logic related to `cable_user%MetType`) to
remove code duplication and improve code clarity across each of the
drivers (see #425, #440, #469 and #472).

As part of the refactoring changes, the `cable_user%MetType = 'gpgs'`
namelist parameter is deprecated as it is equivalent to setting
`cable_user%MetType = 'gswp'` and `leaps = .TRUE.`.

## Type of change

- [x] Code refactor
- [x] New or updated documentation

## Checklist

- [x] The new content is accessible and located in the appropriate
section.
- [x] I have checked that links are valid and point to the intended
content.
- [x] I have checked my code/text and corrected any misspellings


<!-- readthedocs-preview cable start -->
----
📚 Documentation preview 📚:
https://cable--498.org.readthedocs.build/en/498/

<!-- readthedocs-preview cable end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants