Non-steady state (static) chambers are widely used for measuring soil
greenhouse gas (GHG) fluxes, such as CO2, CH4,
N2O, NH3, CO, and water vapor (H2O).
While linear regression (LM) is commonly used to estimate GHG fluxes,
this method tends to underestimate the pre-deployment flux
(f0). Indeed, non-linearity is expected when the gas
concentration increases or decreases inside a closed chamber, due to
changes in gas gradients between the soil and the air inside the
chamber. In addition, lateral gas flow and leakage contribute to
non-linearity. Many alternative to LM have been developed to try to
provide a more accurate estimation of f0, for instance the
method of Hutchinson and Mosier (HM)
(1981), which
was implemented in the HMR
package by Pedersen et al.,
2010. However,
non-linear models have a tendency to largely overestimate some flux
measurements, due to an exaggerated curvature. Therefore, the user is
expected to decide which method is more appropriate for each flux
estimate. With the advent of portable greenhouse gas analyzers
(e.g. Los Gatos Research (ABB) laser gas
analyzers),
soil GHG fluxes have become much easier to measure, and more efficient
ways to calculate these flux estimates are needed in order to process
large amounts of data. A recent approach was developed by Hüppi et al.,
2018 to restrict the
curvature in the HM model for a more reliable flux estimate. In the HM
model, the curvature is controlled by the kappa parameter, KAPPA.MAX
to limit the maximal
curvature allowed in the model (see their R package
gasfluxes
, available
on the CRAN). This procedure introduces less arbitrary decisions in the
flux estimation process.
Previous software developed to calculate GHG fluxes are limited in many
aspects that the goFlux package is meant to overcome. Most are limited
to the linear regression approach (e.g. Flux
Puppy,
and the R packages flux
and
FluxCalR
), others do not
include data pre-processing (e.g. the R packages
HMR
,
flux
and
gasfluxes
), or if they
do, they are compatible with only a few designated systems (e.g. Flux
Puppy
and FluxCalR
), and none
include an automatic selection of the best flux estimate (LM or HM)
based on objective criteria, except the R packages
gasfluxes
and
HMR
.
This new R package goFlux
is meant to be “student proof”, meaning that
no extensive knowledge or experience is needed to import raw data into
R, choose the best model to calculate fluxes (LM or HM), quality check
the results objectively and obtain high quality flux estimates. The
package contains a wide range of functions that allows the user to
import raw data from a variety of instruments (LI-COR, LGR, GAIA2TECH,
Gasmet, Picarro, Aeris and PP-Systems); calculate fluxes from a variety
of GHG (CO2, CH4, N2O, NH3,
CO and H2O) with both linear (LM) and non-linear (HM) flux
calculation methods; align instruments clocks after data import;
interactively identify measurements (start and end time) if there are no
automatic chamber recordings (e.g. LI-COR smart chamber); plot
measurements for easy visual inspection; and quality check the
measurements based on objective criteria.
Three R packages for the Elven-kings under the CRAN,
Seven for the Dwarf-lords in their halls of open software,
Nine for Mortal Men doomed to write scripts themselves,
One for the Dark Lady on her dark throne
In the Land of GitHub where the Shadows lie.
One Package to rule them all, One Package to find them,
One Package to bring them all and in the darkness bind them
In the Land of GitHub where the Shadows lie.
The R package goFlux
is meant to be “student proof”, meaning that no
extensive knowledge or experience is needed to import raw data into R
(except for knowing how to use R, of course), choose the best model to
calculate fluxes (LM or HM, that is the question. -Shakespeare, 1603),
quality check the results objectively (hence the no experience needed)
and obtain high quality flux estimates from static chamber measurements
(wonderful!).
Look up this webpage for a demonstration of the package usage.
The package contains a wide range of functions that let the user import raw data from a variety of instruments:
- LI-COR trace gas analyzers: LI-6400 for CO2 and H2O, LI-7810 for CO2, CH4 and H2O, LI-7820 for N2O and H2O, and LI-8100 for CO2 and H2O
- LI-COR Smart Chamber: for an easy import of data from any gas analyzer
- Los Gatos Research (ABB) laser gas analyzers: Ultra-portable Greenhouse Gas Analyzer (GLA132-UGGA) and Microportable Greenhouse Gas Analyzer (GLA131-MGGA) for CO2, CH4 and H2O, and GLA151-N2OM1 for N2O, CH4 and H2O
- Picarro G2508 Gas Concentration Analyzer: for CO2, CH4, N2O, NH3 and H2O
- Picarro GasScouterTM G4301 Mobile Gas Concentration Analyzer: for CO2, CH4 and H2O
- GAIATECH Automated ECOFlux chamber: for an easy import of data from any gas analyzer
- Gasmet DX4015 portable analyzer for humid conditions: for CO, CO2, CH4, N2O, NH3 and H2O
- PP-Systems EGM-5 Portable CO2 Gas Analyzer: for CO2, O2 and H2O
- Aeris technologies MIRA Ultra Gas Analyzers: N2O/CO2 high accuracy analyzer (+H2O) and CH4/C2H6 Natural Gas Leak Detection System (+H2O)
- Custom made multiplexer from the University of Padova, Italy, which integrates data from multiple automatic chambers linked to a Gasmet DX4015.
After import, the user can choose from two methods to define the start and end points of each measurement and assign a UniqueID:
- Manual identification of measurements - based on
start.time
, provided separately in an auxiliary file. The functionobs.win
splits the imported data into a list of data frame (divided byUniqueID
) and creates an observation window around thestart.time
to allow for a manual selection of the start and end points of each measurements, using the functionclick.peak2
. - Automatic selection of measurements - based on automatic recordings of chamber opening and closing from an instrument such as the LI-COR Smart Chamber or the GAIATECH Automated ECOFlux chamber.
Follow these links for more details and demonstrations about importing raw data from the listed instruments, or the identification of measurements.
The function goFlux
calculates fluxes from a variety of greenhouse
gases (CO2, CH4, N2O, NH3,
CO, and H2O) using both linear (LM) and non-linear (HM;
Hutchinson and Mosier,
1981) flux
calculation methods. The HM model for the chamber concentration
Where
A maximum threshold for this parameter, kappa-max (
Where goFlux
function, so that the non-linear flux estimate
cannot exceed this maximum curvature. See below for more details about
the minimal detectable flux (MDF).
All flux estimates, including the MDF, are multiplied by a
Where
More details and demonstrations about the function goFlux
can be found
here.
Following flux calculation, the user can select the best flux estimate
(LM or HM) based on objective criteria, using the best.flux
function:
- Assumed non-linearity: If all criteria are met, the best flux estimate is assumed to be the non-linear estimate from the Hutchinson and Mosier model.
- G-factor: the ratio between a non-linear flux estimate (e.g. Hutchinson and Mosier; HM) and a linear flux estimate.
-
Kappa ratio: maximal limit of the ratio between kappa and
kappa-max (
$k.max$ ). - Indices of model fit: the best model can be selected based on a selection of indices of model fit: MAE, RMSE, SE and AICc. In addition to the automatic selection of the best flux estimate based on these indices of model fit, measurements can be flagged “noisy” using these criteria.
In addition, the best.flux
function can flag measurements that are
below detection limit (MDF and p-value), out of bounds (intercept), or
too short (number of observations).
- Minimal Detectable Flux (MDF): limit under which a flux estimate is considered below the detection limit.
- Statistically significant flux (p-value): limit under which a flux estimate is considered statistically non-significant, and below the detection limit.
-
Intercept: inferior and superior limits of the intercept (initial
concentration;
$C_0$ ). - Number of observations: limit under which a measurement is flagged for being too short.
By default, all criteria are included:
criteria = c("MAE", "RMSE", "AICc", "SE", "g-factor", "kappa", "MDF", "nb.obs", "p-value", "intercept")
A demonstration of the usage of the function best.flux
can be found
here.
The g-factor is the ratio between a non-linear flux estimate (e.g. Hutchinson and Mosier; HM) and a linear flux estimate (Hüppi et al., 2018).
With the best.flux
function, one can choose a limit at which the HM
model is deemed to overestimate (f0). Recommended
thresholds for the g-factor are <4 for a flexible threshold, <2 for a
medium threshold, or <1.25 for a more conservative threshold. The
default threshold is g.limit = 2
. If the g-factor is above the
specified threshold, the best flux estimate will switch to LM instead of
HM. If HM.flux/LM.flux
is larger than g.limit
, a warning is given in
the columns HM.diagnose
and quality.check
.
The minimal detectable flux (
Where the instrument precision is in the same units as the measured gas (ppm or ppb) and the measurement time is in seconds.
Below the MDF, the flux estimate is considered under the detection
limit, but not null. Therefore, the function will not return a 0. There
will simply be a warning in the columns quality.check
, LM.diagnose
or HM.diagnose
to warn of a flux estimate under the detectable limit.
No best flux estimate is chosen based on MDF.
The parameter kappa determines the curvature of the non-linear
regression in the Hutchinson and Mosier model, as shown in equation 1.
The limit of kappa-max, as calculated in equation 2, is included in the
goFlux
function, so that the non-linear flux estimate cannot exceed
this maximum curvature.
In the function best.flux
, one can choose the linear flux estimate
over the non-linear flux estimate based on the ratio between kappa and
kappa-max (k.ratio
). A value of 1 indicates HM.k = k.max
, and a
value of e.g. 0.5 indicates HM.k = 0.5*k.max
. The default setting is
k.ratio = 1
. If HM.k/k.max
is larger than k.ratio
, a warning is
issued in the columns HM.diagnose
and quality.check
. The ratio is
expressed as a percentage of kappa-max in the plots.
In the best.flux
function, we included multiple choices of indices of
model fit, described below. One can choose to include however many of
them in the function. If multiple of them are included, the selection of
the best model will be made based on a scoring system. Both models start
with a score of 0. For each criteria, whichever model performs worst is
given +1. After all selected criteria have been evaluated, the model
with the lowest score wins. In case of a tie, the non-linear flux
estimate is always chosen by default, as non-linearity is assumed. The
score is printed in the output data frame in the columns HM.score
and
LM.score
.
The mean absolute error (MAE) is the arithmetic mean of the absolute residuals of a model, calculated as follows:
Where
The root mean square error (RMSE) is very similar to the MAE. Instead of using absolute errors, it uses squared errors, and the mean of the squared errors is then rooted as follows:
Because of the squared errors, RMSE is sensitive to outliers. Indeed, a few large errors will have a significant impact on the RMSE. Therefore, RMSE will always be larger than or equal to MAE (Pontius et al., 2008).
Mathematically, RMSE is equivalent to the standard deviation of the residuals. Indeed, for a constant gas concentration, the standard deviation is expressed as follows:
Where
Considering all of the above, MAE, RMSE and the standard deviation are all measures of how much the data points are scattered around the true mean or the regression. Therefore, MAE and RMSE can be compared to the instrument precision to determine if a measurement is noisy. For a MAE or RMSE larger than the instrument precision, the measurement is considered to have more noise than normally expected.
If MAE is chosen as a criteria in best.flux
, the model with the
smallest MAE is chosen. If both models have a MAE smaller than the
instrument precision, the non-linear flux estimate is always chosen by
default, as non-linearity is assumed. When MAE is larger than the
instrument precision, a warning is given in the columns quality.check
,
LM.diagnose
or HM.diagnose
to warn of a noisy measurement. RMSE
functions exactly the same was as MAE in the best.flux
function.
While the standard deviation describes how the data points are scattered around the true mean, the standard error of a measurement tells how accurate that measurement is compared to the true population mean (Altman and Bland, 2005). If considering the standard deviation as used to calculate instrument precision (equation 8), the instrument standard error (instrument accuracy) can be defined as the standard deviation divided by the square root of the number of observations:
Practically, this means that the accuracy increases with the sample size of a measurement. In other words, if an instrument is imprecise and the measurement has a lot of noise, it is still possible to get a more accurate estimate of the true mean by increasing the number of observations. With high-frequency GHG analyzers, that means increasing the chamber closure time.
To calculate the standard error of a regression, one can use the delta
method (deltamethod
from the msm
package), which propagates the
total error of the flux calculation for each parameter included in the
formula. The delta method approximates the standard error of a
regression
In the function best.flux
, the standard error (SE) of the measurements
can be compared to the standard error of the instrument
(best.flux
, the
model with the smallest SE is chosen. If both models have a SE smaller
than the instrument precision, the non-linear flux estimate is always
chosen by default, as non-linearity is assumed. When SE is larger than
the instrument accuracy (quality.check
, LM.diagnose
or HM.diagnose
to warn of a
noisy measurement.
The AIC estimates the relative quality of a statistical model and is used to compare the fitting of different models to a set of data (Akaike, 1974). Consider the formula for AIC:
Where
In flux calculation, the linear model contains two parameters: the slope
and the intercept (
Where
If AICc is selected as a criteria in the best.flux
function, the model
with the lowest AICc wins.
If the initial gas concentration (C0) calculated for the
flux estimates (HM.C0
and LM.C0
) deviates from the true
C0 (measured concentration at quality.check
, LM.diagnose
or HM.diagnose
to warn of an intercept out of bounds. Alternatively, one can provide
boundaries for the intercept, for example: intercept.lim = c(380, 420)
for a true C0 of 400 ppm.
This criteria is only applicable to the linear flux. Under a defined
p-value, the linear flux estimate is deemed non-significant, i. e.,
flux under the detectable limit. The default threshold is
p.val = 0.05
. No best flux estimate is chosen based on p-value. If
LM.p.val < p.val
, a warning is given in the columns quality.check
and LM.diagnose
to warn of an estimated flux under the detection
limit.
Limit under which a measurement is flagged for being too short
(nb.obs < warn.length
). With nowadays’ portable greenhouse gas
analyzers, the frequency of measurement is usually one observation per
second. Therefore, for the default setting of warn.length = 60
, the
chamber closure time should be approximately one minute (60 seconds). If
the number of observations is smaller than the threshold, a warning is
issued in the column quality.check
.
Finally, after finding the best flux estimates, one can plot the results
and visually inspect the measurements using the function flux.plot
and
save the plots as pdf using flux2pdf
.
Find out more about these functions here.
To install a package from GitHub, one can use the package devtools
(or
remotes
) from the CRAN:
if (!require("devtools", quietly = TRUE)) install.packages("devtools")
Then, install the goFlux
package from GitHub. If it is not the first
time you install the package, it must first be detached before being
updated.
The package is constantly being updated with new functions or de-bugging. To make sure that you are using the latest version, re-install the package every time you use it.
try(detach("package:goFlux", unload = TRUE), silent = TRUE)
devtools::install_github("Qepanna/goFlux")
If prompted, it is recommended to update any pre-installed packages.
The functioning of the package depends on many other packages
(AICcmodavg
, data.table
, dplyr
, ggnewscale
, ggplot2
, ggstar
,
graphics
, grDevices
, grid
, gridExtra
, lubridate
, minpack.lm
,
msm
, pbapply
, plyr
, purrr
, rjson
, rlist
, SimDesign
,
stats
, stringr
, tibble
, tidyr
, utils
), which will be installed
when installing goFlux
.
If you get this warning while trying to install an R package from GitHub:
## Warning: package ‘goFlux’ is in use and will not be installed
This error means that the package is loaded. Before re-installing the package, you must first detach it:
detach("package:goFlux", unload = TRUE)
If this does not solve the problem, restart your session and try again.
If you get this error while trying to install an R package from GitHub:
## Error: Failed to install 'goFlux' from GitHub:
## HTTP error 403.
## API rate limit exceeded for xxx.xxx.xxx.x (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
##
## Rate limit remaining: 0/60
## Rate limit reset at: 2023-10-16 09:08:07 UTC
##
## To increase your GitHub API rate limit
## - Use `usethis::create_github_token()` to create a Personal Access Token.
## - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
This error can occur while using the package remotes
to install an R
package from GitHub. Try using the package devtools
instead. If the
error still occurs, follow the instructions below.
Go to https://github.com/join .
-
Type a user name, your email address, and a password.
-
Choose Sign up for GitHub, and then follow the instructions.
Run in R:
usethis::create_github_token()
On pop-up website, generate and copy your token.
Run in R with your own token generated in step 2:
credentials::set_github_pat("YourTokeninStep2")
Authors: Karelle Rheault and Klaus Steenberg Larsen
The package is ready to use and fully functional, but errors may still occur. To report any issues, suggest improvements or ask for new features, open an issue on GitHub. Alternatively, contact directly the maintainer, Karelle Rheault ([email protected]), including script and raw data if necessary. Thank you for helping me in the development of this tool! 🙏
Please cite this R package using this publication in JOSS:
Rheault et al., (2024). goFlux: A user-friendly way to calculate GHG fluxes yourself, regardless of user experience. Journal of Open Source Software, 9(96), 6393, https://doi.org/10.21105/joss.06393
This software development was supported by the SilvaNova project funded by the NOVO Nordisk Foundation (grant no. NNF20OC0059948).