-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathLab08_MonteCarlo.Rmd
224 lines (155 loc) · 10.1 KB
/
Lab08_MonteCarlo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
title: "Lab 08 - Uncertainty Propagation"
author: "EE375"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Objectives
In this lab we will revisit the Farquhar, von Caemmerer, and Berry (1980) photosynthesis model [FvCB model].
$$A^{(m)} = V_{cmax} \frac{C_i - \Gamma}{C_i+ K_M} - r$$
Table 1: FvCB model parameters in the statistical code, their symbols in equations, and definitions
Parameter | Symbol | Definition
----------|------------|-----------
cp | $\Gamma$ | CO2 compensation point
Vcmax | $V_{cmax}$ | maximum Rubisco capacity
r | $R_d$ | leaf respiration
sigma | $\sigma$ | residual error
Ci | $C_i$ | internal CO2 concentration
K | $K_M$ | half-saturation
* In lab 6 we used **Maximum Likelihood** to estimate parameters the model.
* In lab 7 we used **Bootstrapping** to estimate the uncertainty in the parameters.
* **In lab 8** we're going to extend this analysis one step further, and use the model to make predictions. Using **Monte Carlo** methods, we will propagate uncertainty into a confidence and predictive interval. We will also learn about model sensitivity by adjusting parameter values.
## Set up
The first thing we want to do is to load up the outputs from our previous two analyses. We don't need the entirety of the past two labs, just the following steps:
```{r}
## Load the data
## Define FvCB and lnLike functions
## Fit FvCB's lnLike to data using optim
## Perform nonparameteric bootstrap
```
### Task 1: Create `newdata`
Our first goal is to draw a CI and PI around the model prediction curve. Similar to when we use `predict` with a lm or glm, when we make predictions using Monte Carlo methods we need to define the set of new X's we want to make predictions for, so we will need to define `newdata` as sequence of values along the X-axis.
# Monte Carlo Uncertainty Propagation
Below is the initial stub of code to perform Monte Carlo uncertainty propagation. Over the next three tasks you will be adding additional code to this code chunk to build out the analysis. A lot of this code is provided for you, but pay careful attention to places where you need to fill in detail (e.g. notes like `<insert sample size here>`)
```{r}
## Monte Carlo uncertainty analysis
Nmc = 5000 #number of MC iterations
Aconf = matrix(NA,Nmc,length(newdata)) # Storage for predicted means for A; row=iteration, col=X's
Apred = matrix(NA,Nmc,length(newdata)) # Storage for predicted A; row=iteration, col=X's
Tpred = matrix(NA,Nmc,5) # Storage for parameters (theta) associated with predictions
for(i in 1:Nmc){
## sample inputs
## calculate (and save) predictions
}
```
## Task 2: Sample inputs
In this case the inputs to your Monte Carlo uncertainty propagation are samples of your parameters. In your previous lab you used bootstrapping to produce samples of parameters. Indeed, if the number of MC iterations is the same as the number of bootstrap iterations, then you can just set `Tpred = theta.boot`. In this case, we want to think about this in the general case and so you'll want to resample from theta.boot.
```
Tpred[i,] = theta.boot[sample.int(nrow(theta.boot), <insert sample size here> ), ]
```
## Task 3: Calculate predictions
This case is almost identical to what we did with the parametric bootstrap, with the only real differences being:
1. We want to use `newdata` to provide the X's
2. We want to use the **sample** of parameters rather than the MLE. We need to do this so that we incorporate parameter uncertainty in the predictions.
We can start with the code we wrote for the parametric bootstrap prediction as the basis for the code needed in this lab.
```
Aboot = rnorm(n,FvCB(theta,Ciboot),theta[5]) ### sample
```
Next we want to split this into two lines: one where we save the prediction for the mean, the other where we save the predicted pseudodata.
```
Aconf[i,] = FvCB(<insert new parameters> , <insert new data>) ## predict means
Apred[i,] = rnorm(nrow(as.data.frame(newdata)),Aconf[i,],Tpred[i,5]) ## predict pseudodata
```
At this point you should be able to assemble and run your Monte Carlo loop.
## Task 4: Summarize CI and PI
With the MC simulations completed, our next step is to use `apply` to summarize the results in terms of the CI and PI. As an example, the following code calculates a 80% Confidence interval. We want to do the apply BY COLUMN because each column represents a different X value of the `newdata` sequence that we created. At each point along the sequence we want to calculate both a CI and PI.
```{r}
CI = apply(Aconf,2,quantile, c(0.1,0.9))
```
* Calculate a 95% interval for both the CI and PI
* Add these lines to the plot you made in Lab 6 Question 5 (data + best fit line)
* Now your plot should have the data, a best fit line, the CI and the PI
* Make sure to use different colors and/or line types and to include a legend
In addition to calculating intervals, we can look at the full predictive distribution at any point, as well as the distribution of expected values (i.e. means)
```{r}
j = findInterval(400,newdata) ## Find the Ci value in newdata closest to 400 ppm
plot(density(Aconf[,j]),xlim=range(Apred[,j]),
main=paste("Prediction at",round(newdata[j]),"ppm"),
xlab="Photosynthetic Rate (A)",lwd=3)
lines(density(Apred[,j]),col="red",lwd=3)
legend("topright",legend=c("mean","prediction"),col=1:2,lwd=3)
```
* Using the provided code, produce density plots near 200 and 600 ppm. Use `abline` to add vertical lines at the 95% CI and PI. Comment on the shape of these distributions and the relative balance of parameter vs residual error at each of the three ppm.
# Monte Carlo Sensitivity Analysis
The next thing we're going to do is to use the outputs of our Monte Carlo analysis to look at how sensitive our predictions are to the different parameters in the model. This sensitivity will change for different values of Ci, so we're going to start by looking at the sensitivity at a specific value. First we choose the value of Ci in `newdata` that's closest to 400 ppm.
```{r}
j = findInterval(400,newdata) ## Find the Ci value closest to 400 ppm
sens_j = as.data.frame(matrix(NA,5,3)) ## create a DF to save results
colnames(sens_j) = c("slope","linR2","trendR2")
rownames(sens_j) = c("Vcmax","K","cp","r","sigma")
colnames(Tpred) = rownames(sens_j)
for(i in 1:5){ ## loop over parameters
## calculate linear sensitivities
fit_ij = lm(Aconf[,j] ~ Tpred[,i]) ## fit a line of the jth prediction to the ith parameter
sens_j$slope[i] = coef(fit_ij)[2] ## extract slope
sens_j$linR2[i] = summary(fit_ij)$r.squared ## extract R squared
## calculate R2 around a trendline (note: don't worry about being able to reproduce this bit)
trend = lowess(Tpred[,i],Aconf[,j]) ## trendline
MSE = mean((trend$y - Aconf[order(Tpred[,i]),j])^2) ## mean square error around trendline
sens_j$trendR2[i] = max(1 - MSE/var(Aconf[,j]),0)
## diagnostic plots
plot(Tpred[,i],Aconf[,j],xlab=colnames(Tpred)[i])
lines(trend,col="green")
abline(fit_ij,col="orange")
legend("topleft",legend=c("regression","trendline"),lty=1,lwd=3,col=c("orange","green"))
}
sens_j
```
## Task 5: interpreting sensitivity
Given what you see in the sensitivity plots and the resulting statistics based on the regression and trend lines calculated for each plot, answer the following questions:
* Which parameters causes photosynthetic rate to increase? Which cause it to decrease?
* Are any of the responses nonlinear? If so, which ones and describe the pattern.
* Which parameter is photosynthetic rate the most sensitive to in absolute terms? Which the least?
* In terms of partitioning out the variability in the confidence interval, which parameters have the most impact? Which are not worth worrying about?
* How much of the variability in the confidence interval is not explained by the direct effects of each parameter by itself? What accounts for the remaining variability and how might we quantify that effect?
## Task 6: varying sensitivities (Extra Credit)
If we want to look at how the sensitivities change as Ci changes, we can put our sensitivity calculation in a function
```{r}
sens <- function(j){
sens_j = as.data.frame(matrix(NA,5,3)) ## create a DF to save results
colnames(sens_j) = c("slope","linR2","trendR2")
rownames(sens_j) = c("Vcmax","K","cp","r","sigma")
for(i in 1:5){ ## loop over parameters
## calculate linear sensitivities
fit_ij = lm(Aconf[,j] ~ Tpred[,i]) ## fit a line of the jth prediction to the ith parameter
sens_j$slope[i] = coef(fit_ij)[2]
sens_j$linR2[i] = summary(fit_ij)$r.squared
## calculate R2 around a trendline (note: don't worry about being able to reproduce this bit)
trend = lowess(Tpred[,i],Aconf[,j]) ## trendline
MSE = mean((trend$y - Aconf[order(Tpred[,i]),j])^2) ## mean square error around trendline
sens_j$trendR2[i] = max(1 - MSE/var(Aconf[,j]),0)
}
return(sens_j)
}
```
and then loop over all of our values of Ci
```{r}
sens_array = array(NA,dim=c(length(newdata),5,3))
for(j in 1:length(newdata)){
sens_array[j,,] = as.matrix(sens(j))
}
```
and then plot how sensitivity changes with Ci
```{r}
plot(newdata, sens_array[,1,1],main=colnames(Tpred)[1],
type='l',ylab="slope") ## Vcmax slope
plot(newdata, sens_array[,1,2],main=colnames(Tpred)[1],
type='l',ylab="R2",ylim=range(sens_array[,1,2:3])) ## Vcmax linear R2
lines(newdata,sens_array[,1,3],main=colnames(Tpred)[1],
col="green") ## Vcmax trend R2
```
* Using a loop, plot how the slope and R2 change with Ci for all of the parameters. Describe the trends.
Note: the MC appears to be noisy at low Ci (< ~150 ppm), so don't over-interpret random wiggles in that range, Indeed, you might want to set the xlim to filter those out.
* The places where models are most sensitive to a parameter are also the places where additional data would provide the most information about the parameter. Based on what you learned from these analyses about which parameters are most important and where they are most sensitive, where would you propose to add new Ci values to the sampling protocol if you wanted to improve model fit and predictions.