feat: MpiWrapper::allReduce overload for arrays. #3446

CusiniM · 2024-11-15T20:46:13Z

We had a bug because two arrays of different size were exchanged by grabbing the buffers directly. This overload will prevent it from happening.

I had to add a small helper coz LvArray has ValueType instead of value_type like std objects.

src/coreComponents/common/TypesHelpers.hpp

…OS-DEV/GEOS into feature/cusini/addMpiWrapperForArrays

CusiniM · 2024-12-05T05:37:42Z

@rrsettgast , @corbett5 , @wrtobin this is ready for review.

src/coreComponents/common/TypesHelpers.hpp

MelReyCG

At this point, can we drop or at least set to private the following signature?

template< typename T >
  static int allReduce( T const * sendbuf, T * recvbuf, int count, MPI_Op op, MPI_Comm comm = MPI_COMM_GEOS );

All external calls seem to be able to be translated to Span<T[]>.

CusiniM · 2024-12-07T02:20:57Z

At this point, can we drop or at least set to private the following signature?
template< typename T >
  static int allReduce( T const * sendbuf, T * recvbuf, int count, MPI_Op op, MPI_Comm comm = MPI_COMM_GEOS );
All external calls seem to be able to be translated to Span<T[]>.

yes, you are right, let me try to do this.

…MpiWrapperForArrays

CusiniM · 2025-01-02T15:34:13Z

At this point, can we drop or at least set to private the following signature?
template< typename T >
  static int allReduce( T const * sendbuf, T * recvbuf, int count, MPI_Op op, MPI_Comm comm = MPI_COMM_GEOS );
All external calls seem to be able to be translated to Span<T[]>.

done.

CusiniM · 2025-01-03T10:02:50Z

@sframba or @acitrain can you please have a quick look at the changes to the wave solvers files and approve this?

CusiniM · 2025-01-10T23:37:17Z

...ysicsSolvers/wavePropagation/sem/elastic/secondOrderEqn/isotropic/ElasticWaveEquationSEM.cpp

+      MpiWrapper::allReduce( dasReceivers,
+                             dasReceivers,
+                             MpiWrapper::Reduction::Sum,


@sframba @acitrain this change triggers these diffs:

NFO: Total number of log files processed: 1537 WARNING: Found unfiltered diff in: /Users/cusini1/Downloads/integratedTests/TestResults/test_data/wavePropagation/elas3D_DAS_smoke_08/1497.python3_elas3D_DAS_smoke_08_2_restartcheck_.log INFO: Details of diffs: ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787 ******************************************************************************** Error: /Problem/Solvers/elasticSolver/dasSignalNp1AtReceivers Arrays of types float32 and float32 have 402 values of which 200 fail both the relative and absolute tests. Max absolute difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Max relative difference is at index (np.int64(200), np.int64(1)): value = 0.2, base_value = 0.0 Statistics of the q values greater than 1.0 defined by absolute tolerance: N = 200 max = 2000.0001, mean = 500.0, std = 646.787

The only explanation I can find is that, in the previous code, m_linearDASGeometry.size( 0 ) was not the size of the array. The fact that arrayView2d< real32 > const dasReceivers = m_dasSignalNp1AtReceivers.toView(); makes me wonder why m_linearDASGeometry.size( 0 ) was passed as the size. I am not sure how these two objects are related...

Hi @CusiniM , I think that is exactly the reason: we have an extra receiver that simply tracks time:

GEOS/src/coreComponents/physicsSolvers/wavePropagation/sem/elastic/secondOrderEqn/isotropic/ElasticWaveEquationSEM.cpp

Line 221 in fd36d2a

m_dasSignalNp1AtReceivers.resize( m_nsamplesSeismoTrace, numReceiversGlobal + 1 );

This receiver does not need to be summed across ranks. Therefore only the first dasReceivers .size() - 1 (that is, m_linearDASGeometry.size( 0 )) arrays need to be summed, not the last one. Does this break your new structure or can it be accomodated?

Yeah, it does break it in the sense that the overload I put in place for arrays was written with the idea of summing the full array across ranks. Basically I wanted to ensure that one does not provide the wrong size (which caused a couple of bugs in the past). We could add an overload to allow for a the reduce operation to occur on a specified size <= array.size().

@sframba Is there a reason that this all has to be in one array? It seems that the array is holding different quantity types? If so, can we just create a separate object to hold the last value that is of different quantity type?

@rrsettgast as discussed together, no there is no reason why this should be all one array. We could (should) separate the extra array so that they become of homogeneous dimension. However, this requires some changes in our python code as well. Could we maybe :

Implement the overload suggested by @CusiniM for now, so we unlock this PR, then

When the "strong unit typing" work starts officially, add an item about wave solvers' receiver uniformity and assign it to us, so we can take the time to tidy up this problem properly?
What do you guys think?

I have already implemented it here:

GEOS/src/coreComponents/common/MpiWrapper.hpp

Line 1152 in c18b631

void MpiWrapper::allReduce( SRC_CONTAINER_TYPE const & src, DST_CONTAINER_TYPE & dst, int const count, Reduction const op, MPI_Comm const comm )

Tests are now passing so this just needs @sframba 's approval now.

…OS-DEV/GEOS into feature/cusini/addMpiWrapperForArrays

codecov · 2025-01-14T19:33:23Z

Codecov Report

Attention: Patch coverage is 58.97436% with 16 lines in your changes missing coverage. Please review.

Project coverage is 56.74%. Comparing base (5bfdb01) to head (8f6aefe).
Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
...olvers/solidMechanics/SolidMechanicsStatistics.cpp	0.00%	4 Missing ⚠️
src/coreComponents/common/MpiWrapper.hpp	84.61%	2 Missing ⚠️
...onents/physicsSolvers/PhysicsSolverBaseKernels.hpp	33.33%	2 Missing ⚠️
...hysicsSolvers/solidMechanics/SolidMechanicsMPM.cpp	0.00%	2 Missing ⚠️
src/coreComponents/mesh/ParticleManager.cpp	0.00%	1 Missing ⚠️
...nents/physicsSolvers/contact/ContactSolverBase.cpp	0.00%	1 Missing ⚠️
...olvers/contact/SolidMechanicsEmbeddedFractures.cpp	0.00%	1 Missing ⚠️
...hysicsSolvers/multiphysics/HydrofractureSolver.cpp	0.00%	1 Missing ⚠️
...ers/solidMechanics/SolidMechanicsLagrangianFEM.cpp	0.00%	1 Missing ⚠️
...sicsSolvers/surfaceGeneration/SurfaceGenerator.cpp	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3446      +/-   ##
===========================================
- Coverage    56.74%   56.74%   -0.01%     
===========================================
  Files         1169     1169              
  Lines       101538   101551      +13     
===========================================
+ Hits         57615    57622       +7     
- Misses       43923    43929       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CusiniM · 2025-01-15T16:58:23Z

@sframba please have a look now and approve.

feat: MpiWrapper::allReduce overload for arrays.

20e94e1

CusiniM requested review from rrsettgast, MelReyCG, corbett5 and wrtobin as code owners November 15, 2024 20:46

CusiniM commented Nov 15, 2024

View reviewed changes

src/coreComponents/common/TypesHelpers.hpp Outdated Show resolved Hide resolved

CusiniM commented Nov 15, 2024

View reviewed changes

src/coreComponents/common/TypesHelpers.hpp Outdated Show resolved Hide resolved

small fixes.

a1f1049

CusiniM self-assigned this Nov 15, 2024

CusiniM added the flag: no rebaseline Does not require rebaseline label Nov 15, 2024

CusiniM added 2 commits November 15, 2024 14:46

use new function.

7ef92f5

Merge branch 'feature/cusini/addMpiWrapperForArrays' of github.com:GE…

87f068d

…OS-DEV/GEOS into feature/cusini/addMpiWrapperForArrays

CusiniM mentioned this pull request Nov 20, 2024

fix: bugfix about norm vector size #3445

Merged

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

50a7f26

rrsettgast approved these changes Dec 5, 2024

View reviewed changes

src/coreComponents/common/TypesHelpers.hpp Outdated Show resolved Hide resolved

MelReyCG reviewed Dec 5, 2024

View reviewed changes

CusiniM added 2 commits January 2, 2025 14:14

Merge remote-tracking branch 'origin/develop' into feature/cusini/add…

eeb9952

…MpiWrapperForArrays

WIP: make function private.

81650fc

CusiniM requested review from sframba, acitrain, jhuang2601, castelletto1, paveltomin, frankfeifan, ryar9534, matteofrigo5, Guotong-Ren and untereiner as code owners January 2, 2025 15:24

CusiniM added 2 commits January 2, 2025 16:31

remove spans of 1.

7ed66c4

uncrustify

650a197

paveltomin approved these changes Jan 2, 2025

View reviewed changes

matteofrigo5 approved these changes Jan 2, 2025

View reviewed changes

CusiniM added ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Jan 3, 2025

CusiniM commented Jan 10, 2025

View reviewed changes

CusiniM and others added 6 commits January 13, 2025 08:11

add another overload.

3ffe1fe

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

1505f38

forgot ;

28e36f5

Merge branch 'feature/cusini/addMpiWrapperForArrays' of github.com:GE…

22a1840

…OS-DEV/GEOS into feature/cusini/addMpiWrapperForArrays

use new size.

ad490d4

add static.

018e184

CusiniM added the ci: run code coverage enables running of the code coverage CI jobs label Jan 13, 2025

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

eba5803

rrsettgast and others added 3 commits January 16, 2025 15:06

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

c18b631

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

1dc1423

Merge branch 'develop' into feature/cusini/addMpiWrapperForArrays

8f6aefe

rrsettgast merged commit ac45ee2 into develop Jan 29, 2025
27 checks passed

rrsettgast deleted the feature/cusini/addMpiWrapperForArrays branch January 29, 2025 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MpiWrapper::allReduce overload for arrays. #3446

feat: MpiWrapper::allReduce overload for arrays. #3446

CusiniM commented Nov 15, 2024 •

edited

Loading

CusiniM commented Dec 5, 2024

MelReyCG left a comment •

edited

Loading

CusiniM commented Dec 7, 2024

CusiniM commented Jan 2, 2025

CusiniM commented Jan 3, 2025

CusiniM Jan 10, 2025

sframba Jan 13, 2025 •

edited

Loading

CusiniM Jan 13, 2025

rrsettgast Jan 13, 2025

sframba Jan 20, 2025 •

edited

Loading

CusiniM Jan 21, 2025

CusiniM Jan 21, 2025

codecov bot commented Jan 14, 2025 •

edited

Loading

CusiniM commented Jan 15, 2025

feat: MpiWrapper::allReduce overload for arrays. #3446

feat: MpiWrapper::allReduce overload for arrays. #3446

Conversation

CusiniM commented Nov 15, 2024 • edited Loading

CusiniM commented Dec 5, 2024

MelReyCG left a comment • edited Loading

Choose a reason for hiding this comment

CusiniM commented Dec 7, 2024

CusiniM commented Jan 2, 2025

CusiniM commented Jan 3, 2025

CusiniM Jan 10, 2025

Choose a reason for hiding this comment

sframba Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

CusiniM Jan 13, 2025

Choose a reason for hiding this comment

rrsettgast Jan 13, 2025

Choose a reason for hiding this comment

sframba Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

CusiniM Jan 21, 2025

Choose a reason for hiding this comment

CusiniM Jan 21, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 14, 2025 • edited Loading

Codecov Report

CusiniM commented Jan 15, 2025

CusiniM commented Nov 15, 2024 •

edited

Loading

MelReyCG left a comment •

edited

Loading

sframba Jan 13, 2025 •

edited

Loading

sframba Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading