Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SP mode: model crashed with only 1 PFT #782

Open
XiulinGao opened this issue Sep 28, 2021 · 17 comments · May be fixed by #817
Open

SP mode: model crashed with only 1 PFT #782

XiulinGao opened this issue Sep 28, 2021 · 17 comments · May be fixed by #817

Comments

@XiulinGao
Copy link
Contributor

XiulinGao commented Sep 28, 2021

I was running FATES SP mode with single PFT. By using updated python scripts under tools folder to modify my paramter file, the fates_hlm_pft_map looks like this: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0](reversed in the dimension, so would be 14x1).
This then causes a problem in reading the fates_hlm_pft_map and leads to following error:

The distribution of this host land model PFT : 1
into FATES PFTs, does not add up to 1.0.
Error is: -1.0000000000000000
and the hlm_pft_map is: 0.0000000000000000
Aborting
ENDRUN:ERROR in /home/xiulingao/var-test-runs/CTSM/src/fates/main/EDPftvarcon.F90 at line 1762 $
ERROR: Unknown error submitted to shr_abort_abort.

fates/main/EDPftvarcon.F90

Lines 1752 to 1765 in 70a3e13

! check that the host-fates PFT map adds to one along HLM dimension so that all the HLM area
! goes to a FATES PFT. Each FATES PFT can get < or > 1 of an HLM PFT.
do hlm_pft = 1,size( EDPftvarcon_inst%hlm_pft_map,2)
sumarea = sum(EDPftvarcon_inst%hlm_pft_map(1:npft,hlm_pft))
if(abs(sumarea-1.0_r8).gt.nearzero)then
write(fates_log(),*) 'The distribution of this host land model PFT :',hlm_pft
write(fates_log(),*) 'into FATES PFTs, does not add up to 1.0.'
write(fates_log(),*) 'Error is:',sumarea-1.0_r8
write(fates_log(),*) 'and the hlm_pft_map is:', EDPftvarcon_inst%hlm_pft_map(1:npft,hlm_pft)
write(fates_log(),*) 'Aborting'
call endrun(msg=errMsg(sourcefile, __LINE__))
end if
end do !hlm_pft
end do !ipft

Apprarently, hlm_pft_map is not readed during the run. And then I tried to run SP mode with all 12 PFTs it then did work. So I think there's somehting wrong in either FatesInterfaceMod.F90 or EDPftvarcon.F90 which leads to failure in reading fates_hlm_pft_map when there is only one PFT or the output pft mapping matrix is wrong (so problem with FatesPFTIndexSwapper.py?)

@XiulinGao XiulinGao changed the title SP mode: model crash with only 1 PFT SP mode: model crashed with only 1 PFT Sep 28, 2021
@rosiealice
Copy link
Contributor

OK. Looks like the logic of that check might need altering. I'll try and look into this tomorrow... Thanks!

@XiulinGao
Copy link
Contributor Author

OK. Looks like the logic of that check might need altering. I'll try and look into this tomorrow... Thanks!

Thank you! I''m curious that if the check runs before fates_hlm_pft_map is readed or the reversed order? as during fates params reading process, there's no fates_hlm_pft_map, and after that I saw the pft map check failed. hlm_pft_map also show as 0 in the error message. So I'm wonder the error roots in param register and reading process or in param check process.

@glemieux
Copy link
Contributor

glemieux commented Sep 28, 2021

I think I see the issue: since the loop is iterating through the hlm_pft index one at a time for all fates pfts and checking the sum at each iteration, the sum for a 1x14 array (parameter array dimensionality is flipped during read in), there is only a single zero value to read in the first index, where it fails.

That said, re-reading the code comment:

fates/main/EDPftvarcon.F90

Lines 1752 to 1753 in 70a3e13

! check that the host-fates PFT map adds to one along HLM dimension so that all the HLM area
! goes to a FATES PFT. Each FATES PFT can get < or > 1 of an HLM PFT.

I think the intent of the comment means that the calculation should look like:

sumarea = sum(EDPftvarcon_inst%hlm_pft_map(ipft,:))

since this check is nested in a ipft = 1,npft loop. This way each fates pft should have a total of one across the available hlm pfts.

UPDATE: I forgot that the import flips the dimensionality on purpose, so there isn't a larger issue like I thought at first glance, so ignore the strike throughs.

The bigger issue is that it looks like we have some issues with the consistency between the hlm_pft_map in the parameter file and the way we map it in the code.

In the parameter file we define the dimensionality of the map as hlm pfts for each of the rows, and the fates pfts for each of the columns:

double fates_hlm_pft_map(fates_hlm_pftno, fates_pft) ;
fates_hlm_pft_map:units = "area fraction" ;
fates_hlm_pft_map:long_name = "In fixed biogeog mode, fraction of HLM area associated with each FATES PFT" ;

In EDPftvarcon, this is reversed:

fates/main/EDPftvarcon.F90

Lines 638 to 644 in 70a3e13

! adding the hlm_pft_map variable with two dimensions - FATES PFTno and HLM PFTno
pftmap_dim_names(1) = dimension_name_pft
pftmap_dim_names(2) = dimension_name_hlm_pftno
name = 'fates_hlm_pft_map'
call fates_params%RegisterParameter(name=name, dimension_shape=dimension_shape_2d, &
dimension_names=pftmap_dim_names, lower_bounds=dim_lower_bound)

I checked this with a standard parameter file by adding a diagnostic output that reported size( EDPftvarcon_inst%hlm_pft_map,2) which returned 14, instead of the expected 12 (expected based on the parameter file definition).

Looking further afield, it looks like we are consistent in the index method within the fates code:

do hlm_pft = 1,size( EDPftvarcon_inst%hlm_pft_map,2)
if(bc_in%pft_areafrac(hlm_pft) * EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft).gt.0.0_r8)then
!leaf area index
currentSite%sp_tlai(fates_pft) = currentSite%sp_tlai(fates_pft) + &
bc_in%hlm_sp_tlai(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
!stem area index
currentSite%sp_tsai(fates_pft) = currentSite%sp_tsai(fates_pft) + &
bc_in%hlm_sp_tsai(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
! canopy height
currentSite%sp_htop(fates_pft) = currentSite%sp_htop(fates_pft) + &
bc_in%hlm_sp_htop(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
end if ! there is some area in this patch
end do !hlm_pft

So the 'quick' solution here I think would be to reverse the order of the map dimensions in the parameter file.

@XiulinGao
Copy link
Contributor Author

I think I see the issue: since the loop is iterating through the hlm_pft index one at a time for all fates pfts and checking the sum at each iteration, the sum for a 1x14 array (parameter array dimensionality is flipped during read in), there is only a single zero value to read in the first index, where it fails.

That said, re-reading the code comment:

fates/main/EDPftvarcon.F90

Lines 1752 to 1753 in 70a3e13

! check that the host-fates PFT map adds to one along HLM dimension so that all the HLM area
! goes to a FATES PFT. Each FATES PFT can get < or > 1 of an HLM PFT.

I think the intent of the comment means that the calculation should look like:

sumarea = sum(EDPftvarcon_inst%hlm_pft_map(ipft,:))

since this check is nested in a ipft = 1,npft.

UPDATE: I forgot that the import flips the dimensionality on purpose, so there isn't a larger issue like I thought at first glance, so ignore the strike throughs.

The bigger issue is that it looks like we have some issues with the consistency between the hlm_pft_map in the parameter file and the way we map it in the code.

In the parameter file we define the dimensionality of the map as hlm pfts for each of the rows, and the fates pfts for each of the columns:

double fates_hlm_pft_map(fates_hlm_pftno, fates_pft) ;
fates_hlm_pft_map:units = "area fraction" ;
fates_hlm_pft_map:long_name = "In fixed biogeog mode, fraction of HLM area associated with each FATES PFT" ;

In EDPftvarcon, this is reversed:

fates/main/EDPftvarcon.F90

Lines 638 to 644 in 70a3e13

! adding the hlm_pft_map variable with two dimensions - FATES PFTno and HLM PFTno
pftmap_dim_names(1) = dimension_name_pft
pftmap_dim_names(2) = dimension_name_hlm_pftno
name = 'fates_hlm_pft_map'
call fates_params%RegisterParameter(name=name, dimension_shape=dimension_shape_2d, &
dimension_names=pftmap_dim_names, lower_bounds=dim_lower_bound)

I checked this with a standard parameter file by adding a diagnostic output that reported size( EDPftvarcon_inst%hlm_pft_map,2) which returned 14, instead of the expected 12 (expected based on the parameter file definition).
Looking further afield, it looks like we are consistent in the index method within the fates code:

do hlm_pft = 1,size( EDPftvarcon_inst%hlm_pft_map,2)
if(bc_in%pft_areafrac(hlm_pft) * EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft).gt.0.0_r8)then
!leaf area index
currentSite%sp_tlai(fates_pft) = currentSite%sp_tlai(fates_pft) + &
bc_in%hlm_sp_tlai(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
!stem area index
currentSite%sp_tsai(fates_pft) = currentSite%sp_tsai(fates_pft) + &
bc_in%hlm_sp_tsai(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
! canopy height
currentSite%sp_htop(fates_pft) = currentSite%sp_htop(fates_pft) + &
bc_in%hlm_sp_htop(hlm_pft) * bc_in%pft_areafrac(hlm_pft) &
* EDPftvarcon_inst%hlm_pft_map(fates_pft,hlm_pft)
end if ! there is some area in this patch
end do !hlm_pft

So the 'quick' solution here I think would be to reverse the order of the map dimensions in the parameter file.

Thanks Greg! So it is still the param check code need to be modified? I guess that makes it easier.

@glemieux
Copy link
Contributor

Thanks Greg! So it is still the param check code need to be modified? I guess that makes it easier.

Yep, I just want to make sure to hear @rosiealice feedback since the way the check is written I might be misinterpreting the goal.

@huitang-earth
Copy link
Contributor

huitang-earth commented Sep 29, 2021

@rosiealice @glemieux , just let you know that when I ran single PFT simulation (for moss and lichen test), I also have the same problem. So, I just commented out the "endrun" for a quick fix (https://github.com/huitang-earth/fates/blob/367c8fa29b99a35e1f467c4f87ead30f1b96aff7/main/EDPftvarcon.F90#L1717), which works. Current check will only work if you use all the FATES pfts to match all the host land model pfts. But if we use a subset of the FATES pfts, should we allow "sumarea" to be less than 1 (not larger than 1, of course)? Would using a "warning message" rather than a "endrun" be more proper here then?

@XiulinGao
Copy link
Contributor Author

I was actually thinking that the check logic should be doing a match index between HLM and FATES pfts first and only loop through all the matched HLM pfts to check if all summed to 1 instead of looping through all HLM pfts (which I personally think is not efficient).

@rosiealice
Copy link
Contributor

rosiealice commented Sep 29, 2021

So, I was coming slowly to the conclusion that the intention of this check is only appropriate for a certain set of use cases. But then I thought that perhaps the logic is OK. In fact, if we have only one FATES PFT, then we need to allocate all of every HLM PFT to that single FATES PFT to fill up all of the space (and every value in the hlm_pft_map vector should be 1.0).

Otherwise, we'll be running with a mostly empty world? Maybe that's actually what you want to do (i.e. run for only places where this HLM PFT exists on the map) but to do that, we'd have to allow this check to fail, and then allocate all the remaining land as bare ground.

Does that make sense?

@ckoven
Copy link
Contributor

ckoven commented Sep 29, 2021

@rosiealice @XiulinGao is an alternative to put in a switch that says, if running SP or nocomp mode with some subset of PFTs, that we rescale the PFT coverage so that if any of the subsetted PFTs exist on a gridcell, we reweight the PFTs so that the subsetted ones fill up the available space? That way one can use SP mode to compare against site data with known composition that is different from the gridcell that it is in?

@rgknox
Copy link
Contributor

rgknox commented Oct 19, 2021

Note this check should only be applied when we are using biogeographical constraints, this should not be triggering for fully dynamic runs, ie add the condition:

        if((hlm_use_fixed_biogeog.eq.itrue) .or. (hlm_use_sp.eq.itrue)) then
  
          check

        end if

@rosiealice
Copy link
Contributor

I ran into this yesterday after making a new parameter file and running in full-fates mode. hlm_pft_map was all messed up. We should indeed add that check...

@ckoven
Copy link
Contributor

ckoven commented Oct 22, 2021

I just ran into this in a full-fates run too. will add a PR with a fix.

@rosiealice rosiealice linked a pull request Dec 6, 2021 that will close this issue
5 tasks
adamhb added a commit to adamhb/fates that referenced this issue Jan 12, 2022
…sci.1.47.0_api.17.0.0 (commit# f32d831). I this commit ahb commented out the pft param check that was causing the run to crash (see NGEET#782)
@jkshuman
Copy link
Contributor

@ckoven was there a PR merged that fixed this?

@rgknox
Copy link
Contributor

rgknox commented Dec 12, 2022

@glemieux and I are reading through this, and it does not seem like the original issue is addressed by #798. It seems like we are still trying to figure out how to proceed vis-a-vis the points made by @ckoven and @rosiealice on Sept 29th 2021.

@rosiealice
Copy link
Contributor

Looks like #798 was indeed supposed to have fixed this...

@glemieux
Copy link
Contributor

glemieux commented Feb 20, 2024

A variation on this issue came up again in conversation with @jenniferholm when running a FBG only case, but with 3 pfts. If I understand correctly, the current method of setting up the fates_hlm_pft_map is as @rosiealice mentioned above in which we are setting what I'd call 'dummy' mappings for HLM pfts that map to an arbitrary fates pft of interest, while also modifying the PCT_NAT_PFT value on the surface data set to zero out those HLM fractions. I sketched out a spreadsheet for a quick visual.

Is my understading here of the usage (for FBG only at least) correct?

@jenniferholm
Copy link
Contributor

@sshu88 - came across this issue as well in a case where he modified the PCT_NAT_PFT in the surface data file. He did a manual fix to make sure that each row summed to 1, like you show in your "sum check", instead of having the row sum to zero. Anyway, just tagging @sshu3 here since also saw this problem and did a one time fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants