-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tripole initial/restart file with inconsistent values on the tripole seam #937
Comments
I'm going to speculate this has to do with a tripole halo update on restart. There are several questions. First, the uvel at initialization looks like it's symmetric across the center gridpoint on the tripole line (as expected), but netcdf and pio have different signs. That's not good. Second, we do run some tests on tx1 in the CICE test suite but none use PIO. I will see if I can run some tx1 tests with PIO in CICE. Finally, @DeniseWorthen, can you email me a sample ice_in file for one of your cases. Just want to check things like the boundary_type settings and so forth, so I can more readily try to duplicate the problem. I assume this bug would produce non-exact restarts, so I'll do that test too with tx1 and PIO. My guess is that this is a problem in PIO with tripole, not netCDF. But will try to confirm that too. |
@apcraig Thanks, I'll email you the ice_in shortly. My next step was going to be to move backwards in the commit history but I would love if you could reproduce. I haven't yet tried to figure out why the 1-deg config would not be impacted. |
Thanks @DeniseWorthen. I'm trying to duplicate this error in CICE now with tx1. Hopefully, I'll find something. The test suite currently does not cover this combination. It is odd that your 1 deg case behaves differently, let see if I can duplicate first. Will keep you posted. |
Quick tests with tx1 in CICE don't show a problem. Trying to understand better what might be happening
The implementation of the netcdf and pio "halo update" on restart is very different. The netcdf reads a global array and scatters it with a halo update part of the scatter. The pio reads directly into the decomposed blocks then does a halo update explicitly on the decomposed data. There could be other differences, I'm still looking. But what I want to understand better is whether the initial conditions you're using are playing a role. That might explain why the one degree case doesn't show the same problem (maybe the one degree restart file is newer/different somehow?). And if both CICE standalone and your cases restart exactly with "new" restart files and are interchangeable between pio and netcdf, then that also suggests something may be funny in the initial condition files. That doesn't mean it's not a bug in CICE, but it would help to understand whether the initial condition files are playing an important role. Depending how these tests go, it might be worthwhile to try to setup your one degree and 5 (or 1/2) degree tests in CICE standalone to see if I can duplicate. I may ask for the grid, kmt, ice_in, and restart files for those configurations from UFS if it's possible to share. |
Hi Tony, I'll take on your suggestions and let you know what I find. I'll be able to share files if that is useful to you. Either derecho or hera works for me (my current tests are on hera). For restarts, yes our current cases using NetCDF restart exactly. I'm not going to say the same about PIO---I believe it does, but I'm can't lay my hands on that case at the moment. |
Are your grids tripole or tripoleT? That is, does the tripole seam / fold go through U points or T points? Also, are you running C-grid? I think the tripole halo extrapolate gets turned off for the C-grid. Did you try restart_ext = .true.? |
@dabail10 The grid is 'tripole' (both grid_type = 'tripole' and ns_boundary_type = 'tripole'). This is not the C-grid. The two cases were run in the same sandbox, with identical inputs and ice_in other than the _format specifiers. The only difference was the executable and whether I ran CICE compiled w/ NetCDF or compiled with PIO. |
This is a crazy thought, but are you changing processors across a restart? Do you have land block elimination? That is distribution_type = 'block'? Also, does restart_ext.= .true. help? |
@dabail10 These are not actually restart runs, they are 'initial'. I'm reading a restart file as an IC. I am not using land block elimination. I'll try the restart_ext=.true. setting along w/ Tony's suggestions. |
OK. Sorry I missed that. Do you ever use land block elimination? That is, do your initial files have missing values? I'm guessing that restart_ext will not help here, since the initial files probably did not have the ghost cells saved. One thing that is different with PIO compared to netCDF is that land blocks are set to a missing value internal to PIO. I've run into this previously where I tried to set land to zeroes, but PIO sets the land blocks to a different fill value. I had added some code in the restart to check for the PIO missing values. However, I ended up deciding to just remove land block elimination. |
Thanks @DeniseWorthen, sorry this is so difficult. Let me know if you want me to help debug the UFS cases, I have access to hera and derecho. The fact that the netcdf iceh and ic results are "the same" is good to see, I think. PIO is different, that's where the problems almost certainly arise. The fact that 1 degree has the same problem but doesn't diverge is odd, but maybe we should set that aside for the moment. I would be interested to see the netcdf, pio, and ic plots for a restart file generated by the model after 5 days (or something) and then restarted. I still wonder if the ic files you're using are playing a role. Points along the tripole are not unique and maybe there is an expectation that the restart files satisfies some specific tripole requirements, but that the "interpolated" ic files don't meet that requirement while restart files written by the model do (whether with pio or netcdf). That would mean there still is a fix needed in either netcdf or pio so they behave the same, but is that driven largely by an ic file that is technically flawed or not? |
Also, just to clarify and add some additional info. The tx1 cases in standalone CICE do not start from an ic file. They start from some internal initial conditions. But if I look at the uvel at j=ny_global in the restart file of a tx1 CICE run, the plot is more like the PIO values (symmetric with different sign across the center). Also the netcdf and pio restarts are the same for standalone CICE and the results are identical and bit-for-bit throughout the run including thru a restart. So, I think a contributing factor is that the j=ny_global values in your IC files are NOT technically correct. Then netcdf and pio "adjust" the field in different ways, maybe one is right or both are wrong, we don't know yet. If the IC files were technically correct, then whatever netcdf and pio are doing to them must ultimately result in the j=ny_global values being unchanged. So, I think there are a few things to think about. Are there requirements on the ic files for tripole grids on the tripole seam? Should the code be checking the ic files? Should the code be "fixing" the ic data on the tripole if it's technically incorrect? If so, what is the correct way to do that? There are many ways it could be done, clearly the netcdf and pio versions of the code are not doing the same thing. Basically, the ic's being used are not valid on the tripole seam. So there is no right answer to how the code should startup (on some level). Looking at your plots of "netcdf" and "pio" at j=ny_global. It looks to me like netcdf is just using the data as it comes in without any correction to the uvel. I guess the code runs and then after a timestep, the data becomes technically correct on the seam, although that first timestep is probably not so great with respect to science on that first timestep on the seam. But it probably doesn't affect climate and the initial condition is just that. It looks to me like the pio version is trying to "reconcile" the data on the two sides of the seam by averaging one side with the opposite sign on the other side, averaging, then assigning that value to both sides of the tripole but with opposite sign. I assume the initial interpolation is introducing asymmetry in the ic along the seam and that is showing up in the "pio" values which shoudl probably be uvel==0 but aren't quite. |
@apcraig Thanks for your help on this. At this point, I'm convinced the issue lies with the IC files we are using. It is puzzling why some resolutions reproduce and others do not, but fundamentally I can see that the signs are incorrect in our current ICs. I carried out a test by generating a CICE restart after 2 hours using netcdf. Using that restart as the IC, I started both the netcdf and pio cases (as 'initial' runs). The netcdf and pio cases were B4B after 6 hours for both 1deg and 1/2 deg. This tells me that even though netcdf and PIO have different methods of reading/scattering the restart file fields, as long as the restart fields are self-consistent w/ CICE, netcdf and PIO are B4B with each other. This is consistent with your standalone tests etc. I'm satisfied at this point, so we can close the issue. |
Thanks @DeniseWorthen. I am going to leave this issue open for now. I'll update the title. I think there is one outstanding question. If an initial or restart CICE file for tripole grid is not technically correct on the tripole seam, what should the code do, if anything? |
NOTE added later:
The issue here is that the initial condition being used in UFS for tripole grids did not have consistent values on the tripole seam. The values on the seam are not unique (that seam is folded on itself with a particular orientation). Initial conditions were generated for UFS by interpolating from one restart file to another and the tripole seam was not dealt with correctly. On startup, reading with netcdf and PIO did not result in bit-for-bit answers which started the issue. With additional debugging, the problems in the initial condition file became clearer without a clear answer about how CICE should deal with this issue. At present, netcdf seems to just use the inconsistent values directly while PIO makes some adjustments. It's not clear either is correct or reasonable. If CICE writes a restart, this situation does not occur.
=================================
UFS's implementation of CICE has always used the NetCDF interface. In testing the new PIO features, I found what appears to be an issue with restart reads through PIO. This issue predates the recent PR.
Our current develop branch sits just after the 6.4 tag. Using that code, I ran two test cases, one w/ PIO and one with NetCDF using our 1/4deg CICE-MOM6-DATM configuration. In the PIO case, I am setting
restart_format = history_format = pio_pnetcdf
. The run starts with usingIf I write out the IC file (
write_ic = .true.
), I see 8 fields different using NetCDF vs PIO (using cprnc). These 8 fields areAll the differences are located on the last jrow. For the field
uvel
, I see (black = netcdf, red=pio)The mediator history files also show differences on the last row at the first coupling timestep, so the differences are being propagated into the model fields. The netcdf/pio cases quickly diverge.
I then tried the same two cases, but using
ice_ic = none
. In that case, the netcdf/pio cases were B4B after 24 hours.The final bit of the puzzle---the corruption of the last jrow happens for our 1/4, 1/2 and 5 deg tripole grid cases. I does not happen with our 1-deg tripole case.
The text was updated successfully, but these errors were encountered: