-
Notifications
You must be signed in to change notification settings - Fork 12
Uncertainty
Modeled forecasts of streamflow volume are typically the median, or a point estimate of future volume. However, the median value lacks the reliability of the estimate. According to (Helsel and Hirsch, 2002) interval estimates describe the probability of containing the true population value. Confidence intervals describe the likelihood that the interval contains the true population value, whereas prediction intervals describe the likelihood that a single data point with specified magnitude comes from the population under study.
Confidence intervals are pertinent to estimating the center of the dataset, whereas prediction intervals are relevant to new observations. Prediction intervals are wider than confidence intervals because they incorporate both variability of the single point about the median and the error in estimating the median of the distribution. PyForecast uses prediction intervals to estimate uncertainty about the median of modeled streamflow volumes.
As described in the Model Development section, linear regression assumes that errors are symmetrical, or evenly distributed about the median value. This assumption can be tested by plotting model residuals.
The normal distribution has a symmetrical probability density function. The normal distribution is associated with relatively large sample sizes. PyForecast uses a normal probability density function to describe the uncertainty about the median prediction. [citiation needed]
Similar to a normal distribution, the Student's t-distribution has a symmetrical probability density function. The Student's t-distribution is used to describe smaller population sizes than that of the normal distribution [citation needed]. The Student's t-distribution has a lower peak and higher tails when compared to the normal distribution.
While the linear regression process has a necessary assumption of linearity, forecasts can at times include values within the prediction interval that are physically impossible (i.e., negative streamflow volumes) particularly for forecasts under dry conditions. This is an indication that the assumption of symmetrical prediction intervals is not valid (NRCS, 2011) describes using nonlinear transformations to avoid these conditions and create an asymmetrical prediction interval. To maintain the linear regression assumption of prediction interval symmetry, predictor data can be transformed.
One transformation of particular relevance to water resources data is the lognormal transformation, where the predictand is related to the natural logarithm of the predictor. This transformation eliminates the possibility of negative forecasts within the prediction interval.
Anecdotally, PyForecast can produce several virtually identical forecast models as measured by the selected metric of forecast skill. These models may produce significantly different median forecast estimates. Because we have no reason to select a particular model, PyForecast uses KDE eg., Sheather and Jones, 1991 to combine forecast probability density estimates into a single probability density curve suitable for estimating forecast exceedence values.
It should be noted that, currently, PyForecast does not provide any guidance on model selection for inclusion in the KDE process. Models will typically only slightly vary in terms of predictors used. Additional work is required to determine best practices for selecting models to include in the KDE process.