Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B4b reproducibility #134

Merged
merged 3 commits into from
Dec 17, 2019
Merged

B4b reproducibility #134

merged 3 commits into from
Dec 17, 2019

Conversation

aliabdolali
Copy link
Contributor

Hi @ajhenrique @mickaelaccensi @ukmo-ccbunney
The regression test went well, the comparison with develop branch/ NOAA repo is attached.
comp.zip
Thanks Mickael for the great work you've done.
AA

initialize variable for TYPE SGRD
update matrix_datarmor
remove unexpected NOPA switch
see regtests :
mww3_test03/work_PR2_UQ_MPI_e
mww3_test_03/work_PR3_UQ_MPI_e_c
@ajhenrique
Copy link
Collaborator

Hi @aliabdolali this is great progess, thanks to @mickaelaccensi . @aliabdolali have you run twice the matrix for the PR branch and compared it to itself to check it b4b is achieved with your machine given the current cmplr.env settings?

@ajhenrique ajhenrique removed the request for review from ukmo-ccbunney December 17, 2019 20:35
Copy link
Collaborator

@ajhenrique ajhenrique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was originated from @mickaelaccensi and was verified by @aliabdolali and myself. After talking to Ali, and having his commitment that the matrix output from the subsequent runs made with the B4b branch showing the tests are now b4b, we decided to push this through ahead of the UKMO PR because it potentially solves the b4b issues that were plaguing our regtests. @ukmo-ccbunney please update your develop branch and rerun the retest matrix for you PR #132 . We will get this PR #134 first to allow that additional testing which will be very helpful.

@ajhenrique ajhenrique merged commit d7e746d into NOAA-EMC:develop Dec 17, 2019
@ukmo-ccbunney
Copy link
Collaborator

ukmo-ccbunney commented Dec 19, 2019

I'm afraid to report that I am still getting differences in some of the mww3_test_03 regtests on the develop branch.

I ran the regtests twice on the on the same code (NOAAs develop branch).

Any suggestions?


********************* non-identical cases ****************************


mww3_test_03/./work_PR3_UQ_MPI_d2 (12 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (12 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (8 files differ)
mww3_test_03/./work_PR1_MPI_e (1 files differ)
mww3_test_03/./work_PR1_MPI_d2 (10 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (9 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (8 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (10 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)

@ajhenrique
Copy link
Collaborator

ajhenrique commented Dec 19, 2019

@ukmo-ccbunney how does this compare to the previous output from matrix? Did it solve b4b in many cases, or the results are the same? The cases you list have load balancing (communicator fraction for each grid) set separately for each grid within a group. Those cases are not expected to be b4b with @mickaelaccensi fix. The target here is other cases where the communicator fraction is not set. Eg,

grdset d:
'low1' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.00 1.00 F
'low2' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.00 1.00 F
'low3' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.00 1.00 F
'hgh1' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.00 1.00 F
'hgh2' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.00 1.00 F
'hgh3' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.00 1.00 F

grdset d2:
'low1' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.00 0.33 F
'low2' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.33 0.67 F
'low3' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 1 1 0.67 1.00 F
'hgh1' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.00 0.33 F
'hgh2' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.33 0.67 F
'hgh3' 'no' 'no' 'no' 'no' 'no' 'no' 'no' 2 1 0.67 1.00 F

@ukmo-ccbunney
Copy link
Collaborator

@ajhenrique Ah - I hadn't appreciated that it was only fixing those tests not using load balancing.

However, I still have differences with the tests using grdset_e, which appears to have no load balancing set.

I am going to rerun my mww3_test_03 regtests and compare again.

@ukmo-ccbunney
Copy link
Collaborator

@ajhenrique @mickaelaccensi - I am getting similar results to my regtest before the b4b fix.
mww3_test_03 regtests for grdset_d2 and grdset_e are not b4b reproducible.

From the above comments, this would be expeced for grdset_d2, but not grdset_e.

In all of my tests, it is only the restart001.low0 file that is different for grdset_e based tests.

@aliabdolali aliabdolali deleted the B4BRep branch January 24, 2022 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants