Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UKMO Cray bugfixes #105

Merged
merged 11 commits into from
Oct 30, 2019
Merged

UKMO Cray bugfixes #105

merged 11 commits into from
Oct 30, 2019

Conversation

ukmo-ccbunney
Copy link
Collaborator

@ukmo-ccbunney ukmo-ccbunney commented Sep 26, 2019

Changes required to run full regtest suite on Cray HPC using the crayftn compiler.

Adds the following fearures:

  • New comp/link scripts for matrix generation file. (d85e386)

And addresses the following issues:

…hanged unit number from 100 to 110 to avoid Cray reserved unit number range.
…voids record length overflow for large dimension grids in WRITE statment with non-advancing I/O.
  * Added cray_xc to the cmplr.env and w3_setup scripts.
  * Added cray_xc.CCE (Cray Compiler Envrionment) specific comp and link scripts.
  * Explicitly disable OMP in cray comp scripts for non-OMP compilation (Cray compiler enables by default). Note that is still enabled in link script as OMP library is always required by SCRIP code.
  * Updated matrix_ukmo_cray to run everything on a shared node using mpiexec - compilation is not efficient on compute nodes.
  * Add -eg switch (allows use of GOTO jumps into DO loops for SEC1 switch
  * Addition of GNU compilers on Cray architecture.
@ajhenrique
Copy link
Collaborator

ajhenrique commented Sep 26, 2019

Thanks @ukmo-ccbunney. We are moving away from the platform/compiler-specific comp and link files towards using the cmplr.env and generic comp.tmpl and link.tmpl. Does this PR include changes to those files that would allow usage of the cmplr.env, comp.tmpl and link.tmpl without referring to any other external script/file? This would be a requirement for accepting this PR. In the near future we will eliminate all platform/compiler-specific comps and links from the package.

@ukmo-ccbunney
Copy link
Collaborator Author

@ajhenrique - yes the PR does include an update to cmplr.env to include "cray_xc". Would you prefer it if I removed the comp.cray_xc.CCE and link.cray_xc.CCE scripts?

@ajhenrique
Copy link
Collaborator

@ukmo-ccbunney please merge the auth repo develop into the branch associated with this PR. We can then move to get it merged into the NOAA-EMC/WW3 develop. Thanks!

@ukmo-ccbunney
Copy link
Collaborator Author

Develop branch has been merged in: 56857cd

Do you want me to rerun the regtests matrix?

@aliabdolali
Copy link
Contributor

Hi @ukmo-ccbunney @ajhenrique
I ran the regression tests at our end and came with the following differences:

  • For ww3_tp2.10, the time step has changed from 450 90 90 90 to 450 60 60 60. Was it on purpose? As a result, we have different answers.
  • The outputs for ww3_ts1/work_ST6 are different. Do we expect these differences?

The rests look ok to me.
Comparison attached.
Regards
AA
COMP_UKMO.zip

@ukmo-ccbunney
Copy link
Collaborator Author

ukmo-ccbunney commented Oct 25, 2019

Hi @aliabdolali

The differences in ww3_tp2.10 are expeced as the timestep was too large resulting in the model going unstable (when I ran the test originally, the output was mostly NaN values!)

However, I cannot see reason why the output ww3_ts1 should be different. I have not changed anything to do with the source terms. I notice that the tab*.ww3 files are identical in the work directories, so the differences must be very small. I can't see why only the ww3_ts1 test should be affected. Could it be linked to the fact that the mww3* tests are not bit comparible?

Chris.

@ajhenrique
Copy link
Collaborator

Hi @aliabdolali could you provide details of the differences you encountered in ww3_ts1?

@aliabdolali
Copy link
Contributor

@ukmo-ccbunney @ajhenrique
For ww3_tp2.10, we will go with smaller time steps.
For ww3_ts1, I see the differences in ww3.196806.nc
Please check the end of matrixDiff.out in the zip folder.
Ali

@ajhenrique
Copy link
Collaborator

@ajhenrique - yes the PR does include an update to cmplr.env to include "cray_xc". Would you prefer it if I removed the comp.cray_xc.CCE and link.cray_xc.CCE scripts?

@ukmo-ccbunney that would be great if it doesn't affect your work. The idea will be to soon remove all individual comp and link files, not adding new ones unless they are essential is a good way to start cleaning up! Thanks.

@ukmo-ccbunney
Copy link
Collaborator Author

@aliabdolali, @ajhenrique
Regarding the differences in the ww3_ts1 regtest: I have just run the all the ww3_ts1 tests on the develop branch code and compared them with the results from the bf_ukmo_cray branch and the results are identical.

@ukmo-ccbunney
Copy link
Collaborator Author

@ajhenrique - yes the PR does include an update to cmplr.env to include "cray_xc". Would you prefer it if I removed the comp.cray_xc.CCE and link.cray_xc.CCE scripts?

@ukmo-ccbunney that would be great if it doesn't affect your work. The idea will be to soon remove all individual comp and link files, not adding new ones unless they are essential is a good way to start cleaning up! Thanks.

@ajhenrique OK - I will remove the comp and link files for the Cray XC and add the cray_xc.GNU compiler setup to the cmplr.env file (I've already added the cray_xc.CCE one).

One question regarding the compiler setups - why are the Intel and GNU compiler options set up to hardcode the byte order to "big-endian"? I would prefer to keep the native (little-endian in our case) byte order when compiling with the GNU compiler. Are you happy if I modify the GNU section of cmplr.env slightly to make -fconvert=big-endian a system dependent option (native for cray_xc, big-endian for everything else)?

@ghost
Copy link

ghost commented Oct 28, 2019

Hi @ukmo-ccbunney, from what I remember about the big-endian option, the aim was to avoid mixing the way binary files are created by ww3 on differents file systems to be able to safely exchange binary files. So it would be better to keep it as default except if it slow down your computation time.

@ukmo-ccbunney
Copy link
Collaborator Author

@maccensi I note that profiling (-p switch) is always enabled via the "common options". Is this intentional?
We probably would not want that enabled for operational executables (there must be a perfomance hit to having it on?)

@ghost
Copy link

ghost commented Oct 28, 2019

There is almost no extra overhead using -p switch since it's just camping time functions. At least from I tested on our system and from what I read on the documentation about flat profiling but it could be good to test it again, I will do it

@ukmo-ccbunney
Copy link
Collaborator Author

There is almost no extra overhead using -p switch since it's just camping time functions.
Ok - thanks @mickaelaccensi.

@ajhenrique
Copy link
Collaborator

@ukmo-ccbunney @mickaelaccensi I've read here and there that there may be a performance hit in runs over long periods of time using massive resources. I am currently running a test with our coupled system with and without the profiling flag, and will report back to this thread as well.

@ajhenrique
Copy link
Collaborator

Hi @ukmo-ccbunney, from what I remember about the big-endian option, the aim was to avoid mixing the way binary files are created by ww3 on differents file systems to be able to safely exchange binary files. So it would be better to keep it as default except if it slow down your computation time.

@ukmo-ccbunney, I agree with @mickaelaccensi . Please indicate if this results in overheads for your system.

@ajhenrique
Copy link
Collaborator

@mickaelaccensi @ukmo-ccbunney I've completed a first set of tests running our coupled FV3-WW3 ensemble system. This is currently the operational cube sphere for FV3, and a mosaic of 3 grids in WW3 at 1/4 deg resolution (Arctic, core and Antarctic), with 30 perturbed members + 1 control, which are run out to 16 days. I used a Dell machine with 28 cores per node, the run had 14 ppn and 2 OMP threads, and a total of 38 nodes (532 MPI processes per member). With/without the -p flag the control ran ~61min/57min, and the perturbed members averaged ~69min/66min. There is small but noticeable shaving off of time in our system when -p is removed.

@JessicaMeixner-NOAA
Copy link
Collaborator

@ajhenrique Did you remove the -p just for WW3 or both FV3 and WW3?

@ajhenrique
Copy link
Collaborator

ajhenrique commented Oct 28, 2019

@ajhenrique Did you remove the -p just for WW3 or both FV3 and WW3?

@JessicaMeixner-NOAA, Only WW3.

@JessicaMeixner-NOAA
Copy link
Collaborator

Adding more WW3 pets might further reduce the runtime since it seems that WW3 is potentially what's holding back the total timing. Or if you now hit the FV3 time, that -p could even further reduce your runtime from the amount you already saw. Either way it looks like good news for the GEFSv12 timings!

@ghost
Copy link

ghost commented Oct 28, 2019

that's a great news ! so it would be good to put the '-p-' option in a specific profiling section, with _prof suffix like it's done for debugging option with suffix _debug

@ajhenrique
Copy link
Collaborator

@mickaelaccensi that would be preferable. I can do that in a commit to our GEFS_v12 implementation branch and merge it back to develop.

@ukmo-ccbunney
Copy link
Collaborator Author

Excellent! Sounds like an easy win. :)

@ukmo-ccbunney
Copy link
Collaborator Author

I've just pushed up a commit (c134eff ) that removes the Met Office specific comp/link files and updates the cmplr.env with build configs for our Cray HPC using craytfn and gfortran.

@ajhenrique
Copy link
Collaborator

ajhenrique commented Oct 30, 2019

@aliabdolali what is the final recommendation for this PR? I've added both you and @JessicaMeixner-NOAA as reviewers. I look forward to both reviews.

@aliabdolali
Copy link
Contributor

@ajhenrique The suspicious regression test behaves differently on our machine than UKMO's machine. As you can see, Chris has tested that test with the code before and after development and they were identical. So we either need to move on or we investigate more. The rest of the tests are OK.

@ajhenrique
Copy link
Collaborator

ajhenrique commented Oct 30, 2019

@ukmo-ccbunney @aliabdolali @JessicaMeixner-NOAA I reran the matrix for ww3_ts1 and had the following output in matrixCompSummary.out
test(s) :
ww3_ts1

********************* non-identical cases ****************************

************************ identical cases *****************************
ww3_ts1/./work_ST6
ww3_ts1/./work_ST1_RWND
ww3_ts1/./work_ST4_WRT
ww3_ts1/./work_ST4_GMD
ww3_ts1/./work_ST4_TSA
ww3_ts1/./work_ST4
ww3_ts1/./work_ST1
ww3_ts1/./work_ST2
ww3_ts1/./work_ST3

I'm not sure what happened in the previous matrix run, but I'll consider this a done deal and will be ready to merge asap, given @aliabdolali accepted his review, if @JessicaMeixner-NOAA concurs in her review.

@ajhenrique
Copy link
Collaborator

ajhenrique commented Oct 30, 2019

Thanks everybody in this thread for your comments. @ukmo-ccbunney please close issues related to this PR, and make sure you back up some of the discussions above that may be useful elsewhere. Merging!

@ajhenrique ajhenrique merged commit 39d1fb7 into NOAA-EMC:develop Oct 30, 2019
@ukmo-ccbunney
Copy link
Collaborator Author

Fantastic. Thanks everyone.

ukmo-ansaulter pushed a commit to ukmo-waves/WW3 that referenced this pull request Nov 4, 2019
* Removed incorrectly placed commas.
* Added UNIT_AB variable to hold unit number for LOAD_ALPHABETA call. Changed unit number from 100 to 110 to avoid Cray reserved unit number range.
* Fixed instability in ww3_tp2.10 regtest by decreasing CFL timestep
* Added !/OMPG switches to OMP directives.
* Added RECL specifier to OPEN statement enabled by switch /O2c. This avoids record length overflow for large dimension grids in WRITE statment with non-advancing I/O.
* Changes for compilation and regtesting on UK Met Office Cray HPC:
  * Added cray_xc to the cmplr.env and w3_setup scripts.
  * Added cray_xc.CCE (Cray Compiler Envrionment) specific comp and link scripts.
  * Explicitly disable OMP in cray comp scripts for non-OMP compilation (Cray compiler enables by default). Note that is still enabled in link script as OMP library is always required by SCRIP code.
  * Updated matrix_ukmo_cray to run everything on a shared node using mpiexec - compilation is not efficient on compute nodes.
  * Add -eg switch (allows use of GOTO jumps into DO loops for SEC1 switch
  * Addition of GNU compilers on Cray architecture.
* Fixed typo in wminitmd.ftn as rasied in NOAA-EMC#94
* Removed extra brackets around variable list that were causing some
compilers to complain.
* Removed UKMO comp/link scripts and updated cmplr.env accordingly
@ukmo-ccbunney ukmo-ccbunney deleted the bf_ukmo_cray branch November 27, 2019 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants