-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ww3_multi hangs on when creating restart file with IOSTYP=2 or =3 #290
Comments
this happens only when : To test it with a regtest : add a restart output in mww3_test_04/input/ww3_multi_grdset_d.nml
then run the regtest :
it will never end.. when you kill it, here are the lines where it's locked :
line 542 in w3iorsmd.F90 is the MPI_WAITALL function :
bug introduced by commit e756361 @ukmo-ccbunney , @ukmo-juan-castillo , @ukmo-ansaulter, could you correct this bug ? |
Hi Guys, cold you please look at this bug ? I'm not able to upgrade my forecast system with the last version of ww3 due to this bug. thanks |
Hi @mickaelaccensi |
Sorry for the delay, I have been quite busy last week. I will start working on this now and give it all my priority. I think I know where the problem is and it should be easily fixed. |
I run some tests and it looks like this bug was present before merging the new coupling changes. In any case, as these particular lines of code were in my list of things to look at during the optimization issue, I am trying to fix the problem. I narrowed the problem to the communication handlers, that are somehow overwritten. This points to an 'out of bounds' error or similar. When I tried to compile in debug mode I obtained several errors. I reckon that fixing those errors will probably fix the problem. |
So I just noticed that the test: |
* Fixes NOAA-EMC#290 (ww3_multi hanging when generating restart with IOSTYP >= 2) * Also fixes out-of-bounds array access error. * Includes some MPI optimizations
@ukmo-ccbunney found that this bug fix also affect the oasis regtests. After careful examination, I have found a more satisfactory solution that solves the problems in both the 'multi' and the 'oasis' regtests. This bugfix will affect these configurations, and in particular it will change the restart file of multi configurations. The changes will be made in the staging branch and tested there. |
* First set of changes intended to fix the bug (#19) Fixes: #314 * Interpolation weights now correctly calculated on points next to land and BC locations. * Changes to improve the code: the possibility of reading zero values from the input is considered, and points that should not be taken into account in the interpolation are identified by the netcdf fill value; a subroutine is created to avoid code duplication * Bug fix and small simplification/optimization change (#18) * Fixes #290 (ww3_multi hanging when generating restart with IOSTYP >= 2) * Also fixes out-of-bounds array access error. * Includes some MPI optimizations * Correction to the bug fix in branch bf_multi_hang to take into account the coupled configurations, that are also affected * Small correction to the multi_hang branch: revert changes to JSEA index in w3iorsmd Co-authored-by: Juan Manuel Castillo Sanchez <[email protected]> Co-authored-by: ukmo-juan.castillo <[email protected]>
the bug appears when using IOSTYP 2 or 3, it works well with IOSTYP=1
By tracking where it keeps waiting, it seems to be for some processors in w3wavemd :
CALL MPI_WAITALL due to positive value of NRQSG2
and for the dedicated output processor in w3iorsmd :
I'll look for a regtest that highlights the bug
The text was updated successfully, but these errors were encountered: