-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run biocro_regional with calibrated configuration file #16
Comments
PEcAn doesn't currently have the capacity to do these analyses (using one So in short, lets not worry about the VM for now, until we want to Does that sound sensible? On Mon, Oct 12, 2015 at 2:48 PM, Hao Hu [email protected] wrote:
|
Still a little bit confused for the part running biocro_regional without VM. What we did before is to configure the illinois_mxg_settings.xml on VM and then do the job submission there. Then we are able to see the output on ROGER as well as config.xml and job.sh under run folder for each job. I know we can directly execute the job.sh to run biocro_regional on ROGER. For the calibrated configuration file, I guess I should choose one of the job folder (e.g. 99000000002), replace config.xml with calibrated one, and then run job.sh. Is that the correct way to directly run biocro_regional on ROGER? |
The key bit is to run the script /gpfs_scratch/haohu3/pecan/models/biocro/inst/regionalbiocro_met_uncertainty.Rscript using the calibrated config.xml I forget how we submitted this the last time - the key bit is that we need On Mon, Oct 12, 2015 at 3:29 PM, Hao Hu [email protected] wrote:
|
That rings a bell. Last time, we added the ability to sample from met data and select random seed in the R code. I remembered we discussed a little bit about paralleling the code with parallel R loop (e.g., foreach) to improve the performance last time, but did not implement it yet. I think I can take it from here. Thanks! |
David, have the calibrated results adjusted by the decreasing factor for each year along time? If not, can you remind me where should I find the those parameters? |
I have not added this to the script. Eventually, I plan to add it to the run.biocro function. But I have not done so yet. At this point, it would be easier (less prone to error and having to redo the runs) to do the revise the output after the runs are finished. But you can see the code used to fix the data in vignettes/regional_miscanthus_calibration.Rmd, the key bit is: ricker <- function(t, phi1 = 1, phi2 = 1, phi3 = 1) {
y <- phi1 * (t^phi2) * exp(-phi3 * t)
return(y)
}
correct.yield <- function(biocro_result){
start <- min(biocro_result$year)
# lesur correction
corrected_lessur <- ricker(0:20, 4.12, 1.12, 0.13)
age_correction <- corrected_lessur/max(corrected_lessur)
# from Miguez et al 2013
# http://onlinelibrary.wiley.com/doi/10.1111/j.1757-1707.2011.01150.x/epdf;
# multiplied by 0.67 to take account of losses in senescence, postsenescence,
# and during harvest (Clifton-Brown et al., 2004; Heaton et al., 2008).
harvest_correction <- 0.67
correction <- age_correction * harvest_correction
correction <- data.table(year = start + 0:(length(correction) - 1), adjust = correction)
a <- merge(biocro_result, correction, by = 'year')
a[, `:=`(corrected_yield = Yield * adjust)]
return(a)
}
## In the vignette, I use lapply on a list of results, but this is how the function would work:
correct_yield(biocro_result) Looking through the code, I notice that if you are using the version of pecan in your directory on scratch, we need to add |
Thanks, David! I've already have the yield correction code applied. One thing is that I noticed the age_correction does not reach the peak value until year 10. Tao suggested that the peak harvest year for Miscanthus should be around year 4. Does it look right? > age_correction
[1] 0.0000000 0.2414986 0.4609035 0.6373440 0.7724084 0.8708191 0.9378916 0.9787567 0.9980848 1.0000000 0.9880754 0.9653652 0.9344510
[14] 0.8974945 0.8562890 0.8123083 0.7667518 0.7205845 0.6745732 0.6293172 0.5852762 |
Hmm we are both correct. I got that function from a review by Lesur et al; here is the figure: I used that function to calibrate the model because I didn't want to use the same data for the adjustment and calibration. I need to think about this and look back at Becky Arundale's data and will get back to you. |
In short, it wouldn't be wrong to use this correction, though we can certainly do a better job. Its mostly a matter of fitting the ricker model to the Illinois data with a random effect of site. If I give you the Arundale data, would you be able to do this? |
If I am not mistaken, I recall the age correction rate you shared with me before is different. Can we use our existing Illinois data to generate an age correction rate? The curve by Lesuer et al seems a very slow increase in the first 5 years and slow decrease after year 10; while the other curve I recalled has a fast increase at first 3 years but decrease fast after year 5. I agree with you that both methods are correct in theory. Shall we find a time to meet to finalize this before moving to climate data set generation and associated group job submissions?
|
I think it is clear that we should use the Arundale data. It would be better to add the ricker model parameters to the calibration, but in lieu of that, we can fit it given the Arundale data. It is mostly a challenge to fit appropriately, and to make sure that we don't 'double dip' with the calibration data. there are two reasons the Lesur may have that trend. first, it could be influenced by a few sites with long time series. Second, it is likely that in Europe the crops and pathogens grow more slowly and do not deplete resources as quickly. |
I can try to do the fitting. I guess the Arundale data will look like the points in the plot and what I need to do is to find the parameters of a fitting curve (similar as in ricker model), is that correct? |
Another thread for model simulation: Hao told me that the results look the same. If that is the case, how shall we incorporate the inter-annual weather variations in the analysis? |
OK. instead of using corrected_arundale <- ricker(0:20, 5.2 2.6 0.46) This is how I got the parameters (directly from the figure below) d <- data.frame(age = 1:10, arundale = c(5, 10, 21.5, 31.2, 34.5, 33.3, 29.5, 25.1, 23, 23.2))
nls(arundale ~ a * (age) * exp(-c * age), data = d, start=list(a = 4, c = 0.5)) Note that this parameterization reaches a max at year 5, (approximately t <- seq(0,10, by = 0.1)
y <- ricker(t, 5.2, 2.6, 0.46)
t[which.max(y)]
2.6 / 0.46 Figure from Arundale 2013 (Miscanthus on right): |
@taolin1 @haohuuiuc the weather history should determine the yield of a given year. It is possible that the randomization code I put into run.biocro is not working, but the best place to start would be to open a new issue with the code used to determine otherwise. |
David, I was able to do the yield correction based on the corrected_arundale value provided in your code. Here is how the result looks like, In this case, I am doing a 15-year biocro model run, from 1996 to 2010 without random sampling. Year 1996 is the plant year, and the peak yield occurs at year 2002. The maximum corrected yield reaches at 18.382 (the unit is Mg/ha i guess?), do you think this result looks correct? |
It looks too low by about 1/3 Can you point me to the raw model output and the code that you used to generate the figures? |
The annual.result.RData is on the following path on ROGER, The R code I have is as follows, require(data.table)
require(sp)
require(raster)
load("annual.result.RData")
# Correct yield
ricker <- function(t, phi1 = 1, phi2 = 1, phi3 = 1) {
y <- phi1 * (t^phi2) * exp(-phi3 * t)
return(y)
}
correct.yield <- function(biocro_result){
start <- min(biocro_result$year)
# lesur correction
#corrected_lessur <- ricker(0:20, 4.12, 1.12, 0.13)
corrected_arundale <- ricker(0:20, 5.2, 2.6, 0.46)
age_correction <- corrected_arundale/max(corrected_arundale)
# from Miguez et al 2013
# http://onlinelibrary.wiley.com/doi/10.1111/j.1757-1707.2011.01150.x/epdf;
# multiplied by 0.67 to take account of losses in senescence, postsenescence,
# and during harvest (Clifton-Brown et al., 2004; Heaton et al., 2008).
harvest_correction <- 0.67
correction <- age_correction * harvest_correction
correction <- data.table(year = start + 0:(length(correction) - 1), adjust = correction)
a <- merge(biocro_result, correction, by = 'year')
a[, `:=`(corrected_yield = Stem * adjust)]
return(a)
}
correct_yield<-correct.yield(annual.result)
# funtion to create a raster
createRaster <- function(df, brks, cols, name) {
r <- raster(ncols=16, nrows=23, xmn=-91.50, xmx=-87.50, ymn=37.00, ymx=42.75)
y <- df$lat
x <- df$lon
xy <- cbind(x,y)
r0 <- rasterize(xy, r, df$corrected_yield, fun=mean)
plot(r0, breaks=brks, col=cols, main=name)
#plot(counties.il, add=T)
}
# Add raster layer of yield
yield.max <- max(correct_yield$corrected_yield)
yield.min <- min(correct_yield$corrected_yield)
step <- (yield.max - yield.min)/8
brks <- seq(yield.min, yield.max, by=step)
brks <- round(brks,1)
nb <- length(brks)-1
cols <- rev(terrain.colors(nb))
par( mfrow = c(3, 5))
for (i in 1996:2010){
createRaster(subset(correct_yield, year == i), brks, cols, i)
} |
I am getting
I've also edited your comment to include the required libraries. |
Sorry that I forgot to comment out this line, plot(counties.il, add=T) It is just for adding the county boundaries to the map. To make this line work, you will need to load a shapefile first, library("rgdal")
counties.il <- readOGR("./shp", "il") # il is the shapefile in the working directory |
In general, this isn't doing what I thought it was doing, and I will have to check.
So I will have to take a look to figure out where the errors are. |
Thanks for exploring this. If you think it is necessary, we can have a meeting to work around this. |
to parallelize around the lat/lon points, make sure to regester enough cores in your qsub statement then do this:
library(foreach)
library(doMC)
registerDoMC(20)
foreach(point=1:nrow(latlon)) %dopar% {
} |
David, can you help me check the following R code on ROGER when you have time? /gpfs_scratch/haohu3/pecan/models/biocro/inst/regionalbiocro_met_uncertainty.Rscript, |
what error occurs? Please see http://adv-r.had.co.nz/Reproducibility.html |
Error in BioGro(WetDat = WetDat, day1 = day1, dayn = dayn, soilControl = soil.parms, : Here is the R code for regionalbiocro_met_uncertainty.Rscript #!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
rundir <- args[1]
outdir <- args[2]
#library(PEcAn.all)
require(PEcAn.data.land)
require(PEcAn.BIOCRO)
require(BioCro)
require(PEcAn.data.atmosphere)
require(PEcAn.utils)
require(lubridate)
options(error = browser)
if(interactive()) {
library(PEcAn.settings)
settings <- read.settings("/gpfs_scratch/haohu3/pecan/models/biocro/inst/IL/illinois_mxg_settings.xml")
runid <- tail(readLines(file.path(settings$rundir, "runs.txt")), n = 1)
rundir <- file.path(settings$rundir, runid)
outdir <- file.path(settings$outdir, "out", runid)
point <- 1
}
config <- read.biocro.config(file.path(rundir, "calibrated_config.xml"))
met.nc <- nc_open(config$run$met.file)
soil.nc <- nc_open(config$run$soil.file)
# atmco2.nc <- nc_open(file.path(inputdir, "co2/CO2_Global_HD_v1.nc"))
lat <- ncvar_get(met.nc, "latitude")
lon <- ncvar_get(met.nc, "longitude")
latlon <- expand.grid(lat = lat, lon = lon)
annual.result <- NULL
biocro_result <- NULL
for(i in 1:1){
for(point in 1:1) {
set.seed(i)
lat <- latlon$lat[point]
lon <- latlon$lon[point]
out <- run.biocro(lat, lon, met.nc = met.nc, soil.nc = soil.nc, config = config, met.uncertainty=TRUE)
#out <- run.biocro(lat, lon, met.nc = met.nc, soil.nc = soil.nc, config = config)
annual.result <- rbind(annual.result, out$annually)
save(annual.result, file = file.path(outdir, "annual.result.RData"))
point.outdir <- file.path(outdir, i, paste0(lat, 'x', lon))
dir.create(point.outdir, showWarnings = FALSE, recursive = TRUE)
hourly <- out$hourly
daily <- out$daily
#save(hourly, file = file.path(point.outdir, 'hourly.result.RData'))
#save(daily, file = file.path(point.outdir, 'daily.result.RData'))
biocro_result <- rbind(biocro_result, data.table(lat = lat, lon = lon, daily))
model2netcdf.BIOCRO(result = hourly, genus = config$pft$type$genus, outdir = point.outdir,
lat = lat, lon = lon)
rm(hourly)
rm(daily)
}
}
save(biocro_result, file = file.path(outdir, "biocro_output.RData")) By setting the met.uncertainty parameter to TRUE, out <- run.biocro(lat, lon, met.nc = met.nc, soil.nc = soil.nc, config = config, met.uncertainty=TRUE) We call run.biocro.R with an additional step to sample years from met data, here is the extra code in run.biocro.R to handle when met.uncertainty is TRUE, if(met.uncertainty == TRUE){
met <- load.cfmet(met.nc, lat = lat, lon = lon, start.date = ceiling_date(as.POSIXct("1979-01-01"), "day"), end.date = floor_date(as.POSIXct("2010-12-31"),"day"))
#met <- load.cfmet(met.nc, lat = lat, lon = lon, start.date = start.date, end.date = end.date)
#year.sample <- met[,list(year = sample(unique(year), size = 15))]
#met <- met[year %in% year.sample$year]
year.sample <- list(year = sample(unique(met$year), size = 15))
met <- met[met$year %in% year.sample$year,]
} else {
met <- load.cfmet(met.nc, lat = lat, lon = lon, start.date = start.date, end.date = end.date)
} |
thanks. I had to fix a bunch of things in the code to get this working. They are in commit PecanProject/pecan@c508b5b. It should work now after you pull changes from github (or copy e.g.
OR
THEN you can run the script This is how I ran it for testing
and the results in but you will probably need to change the name of the |
Thanks David for the quick fix. I got the code working. One thing I am curious about is that the uncertainty run generated 13 records for each scenario instead of 15 (# of samples you set in the sample function) in my test run. The output can be found here, /gpfs_scratch/haohu3/pecan/models/biocro/inst/IL/out/annual.result.RData. |
the easy fix is to change |
I see, it's due to the duplicated year value during the sampling. Thanks! |
And I've pushed (but not extensively tested) a fix that should allow sampling with replacement: PecanProject/pecan@b2ca2c1 Then cleaned it up a little bit for clarity here: PecanProject/pecan@13889d0 |
David, I have generated the results of Champaign for 500 runs. To calculate the yield, should I use, /gpfs_scratch/haohu3/pecan/models/biocro/inst/IL/out Please let me know if you find anything wrong this result. |
Yes, stem+leaf * adjust On Fri, Nov 6, 2015 at 10:34 AM, Hao Hu [email protected] wrote:
|
Thanks, David. I updated the result for Champaign here # Correct yield
ricker <- function(t, phi1 = 1, phi2 = 1, phi3 = 1) {
y <- phi1 * (t^phi2) * exp(-phi3 * t)
return(y)
}
annual.result$yeari<-round(annual.result$yearindex/10000)
correct.yield <- function(biocro_result){
start <- min(biocro_result$yeari)
# lesur correction
corrected_arundale <- ricker(0:20, 5.2, 2.6, 0.46)
age_correction <- corrected_arundale/max(corrected_arundale)
# from Miguez et al 2013
# http://onlinelibrary.wiley.com/doi/10.1111/j.1757-1707.2011.01150.x/epdf;
# multiplied by 0.67 to take account of losses in senescence, postsenescence,
# and during harvest (Clifton-Brown et al., 2004; Heaton et al., 2008).
harvest_correction <- 0.67
correction <- age_correction * harvest_correction
correction <- data.table(yeari = start + 0:(length(correction) - 1), adjust = correction)
a <- merge(biocro_result, correction, by = 'yeari')
a[, `:=`(corrected_yield = (Stem+Leaf) * adjust)]
return(a)
}
correct_yield<-correct.yield(annual.result) The yield now ranges from 0 to 19.08. Do you think this value is still too low? If you think it looks okay, I will start the run for entire Illinois. |
Yes it seems too low. I will review my calibration over the weekend |
@haohuuiuc OK. I have updated the calibration step to calibrate against the mean yields for each site, rather than each year independently, and have selected the parameter set that minimizes the RMSE over the training / calibration data: https://gist.github.com/dlebauer/1cdcda5bdc7ce2818dc8 Give this one a shot. |
Thanks, David. After applying the new config.xml, the maximum corrected yield increases to 30.92 for 500 runs in Champaign. Do you think it now looks reasonable? The raw yield data is generated on ROGER |
Yes that is reasonable. Go for it! After IL we can do the US.
|
Hi David, I guess I only need to replace config.xml under run/ directory and execute job.sh to run biocro regional on ROGER. But these files and folders are generated by RStudio from the VM, if we would like to submit the job through RStudio interface, should we put the calibrated configuration file somewhere in VM?
The text was updated successfully, but these errors were encountered: