Skip to content

Commit

Permalink
documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ChiragKumar9 committed Dec 18, 2024
1 parent 2174966 commit 15272d2
Show file tree
Hide file tree
Showing 5 changed files with 97 additions and 78 deletions.
51 changes: 32 additions & 19 deletions docs/natural-history-inputs.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,41 @@
# Natural history model inputs

## Infectiousness over time
## Overview
We provide a way for reading in a user-specified natural history parameters (infectiousness over time
or the generation interval, viral load over time, etc.). This CSV can be expanded to also include
symptom onset and improvement times for a given natural history parameter set.

We provide a way for reading in a user-specified infectiousness over time distribution (generation interval)
and appropriately scheduling infection attempts based on the distribution. The user provides an input file
that contains samples from the cumulative distribution function (CDF) of the generation interval (GI) over
time at a specified $\Delta t$, describing the fraction of an individual's infectiousness that has passed
by a given time. The input data are assumed to have a format where the columns represent the times since
the infection attempt (so starting at $t = 0$) and the entries in each row describe the value of the GI
CDF. Each row represents a potential trajectory of the GI CDF.
By specifying all of these parameters from a CSV file, the user can provide any natural history parameters
they want in a very flexible fashion. For instance, if natural history parameters are correlated (i.e.,
generation interval and symptom improvement), this can be modeled by providing a joint parameter set
in the CSV. Comparatively, if the parameters are uncorrelated, that can also be modeled by just having
the CSV inputs be independent draws from a distribution.

## Data input format

Data are input in a long format. Columns include `id`, `time`, and `gi_cdf`. Future work includes expanding
to include `viral_load`, `symptom_onset_time`, and `symptom_improvement_time`. Each `id` refers to a distinct
sample from the natural history parameters at some `time` since the person is first infected. The `gi_cdf`
column describes the fraction of infectiousness that has occured at a given `time` for a given parameter set.

## Implementation

People are assigned a trajectory number (row number) when they are infected. This allows for each person
to have a different GI CDF if each of the trajectories are different. However, that trajectory number will
to have a different GI CDF if each of the trajectories are different. That trajectory number will
be used for also drawing the person's other natural history characteristics, such as their symptom onset
and improvement times or viral load trajectory. This allows easily encoding correlation between natural
history parameters (the user provides input CSVs where the first row in each CSV is from a joint sample
of GI, symptom onset, symptom improvement, etc.) or allowing each of the parameters to be independent.
history parameters (the user provides input CSVs where the various values are all a joint sample
of natural history parameters.) or allowing each of the parameters to be independent.

## Overall Assumptions
## Assumptions
1. There are no requirements on the number of trajectories fed to the model. Trajectory numbers are assigned
to people uniformly and randomly. However, this means that an individual who is reinfected could have the exact
same infectiousness trajectory as their last infection.
2. There must be the same number of parameter sets for each parameter provided as an input CSV. For now, we are focusing
only on GI, but we will soon expand our work to also include symptom onset and symptom improvement times.
3. We have not yet crossed the barrier of how to separately treat individuals who are asymptomatic only. Are their
GIs drawn from a separate CSV? Should their $R_i$ just be multiplied by a scalar? Part of the reason we are deferring
this decision is because our previous isolation guidance work focused only on symptomatic individuals.
to people uniformly and randomly. A user must provide enough trajectories that they provide a representative
sample of the underlying natural history parameters.
2. There must be the same number of values for each parameter provided in the input CSV. In other words, a user
cannot provide 1000 GI trajectories but only 10 symptom improvement times. The user must ensure that all parameter
sets are complete and do that either via assuming independent draws between parameter values or imposing a correlation.
3. The current input structure lends itself to basically encoding the agent's history of disease as an input parameter.
Is this a good idea? Is this putting too much burden on the user when there are things that could be done in Rust instead?
Imagine an agent is asymptomatic and they have a different GI, the natural history CSV file needs to include both of those
pieces of information to ensure the agent is properly modeled in the simulation. So, the user needs to figure out how to tie
together clinical symptoms with natural history, and the model just simulates whatever correlations a user describes.
1 change: 0 additions & 1 deletion input/input.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
"max_time": 200.0,
"seed": 123,
"r_0": 2.5,
"gi_trajectories_dt": 0.02,
"report_period": 1.0,
"synth_population_file": "input/people_test.csv",
"natural_history_inputs": "input/natural_history.csv"
Expand Down
120 changes: 65 additions & 55 deletions src/natural_history_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ define_rng!(NaturalHistorySamplerRng);

#[derive(Default, Debug)]
struct NaturalHistoryParameters {
times: Vec<Vec<f64>>,
gi_trajectories: Vec<Vec<f64>>,
}

Expand All @@ -22,82 +23,109 @@ define_data_plugin!(
NaturalHistoryParameters::default()
);

/// Read in input natural history parameters from CSVs, validate them as valid inputs,
/// and store them for later querying through the `ContextNaturalHistoryExt` trait.
/// Read in input natural history parameters from CSVs, validate them as valid natural
/// history parameters, and store them for later querying through the
/// `ContextNaturalHistoryExt` trait.
pub fn init(context: &mut Context) -> Result<(), IxaError> {
// Read in the generation interval trajectories from a CSV file.
// Read in the natural history parameter distributions from a CSV file.
let path = &context
.get_global_property_value(Parameters)
.unwrap()
.natural_history_inputs;
let inputs = read_natural_history_inputs(path)?;
// Check that the natural history inputs are valid.
// Are all the times positive? (They should be times *since* infection).
check_valid_times(&inputs.times)?;
// Do each of the GI trajectories make up a valid CDF?
check_valid_cdf(&inputs.gi_trajectories, "GI")?;

// If all checks have passed, store the natural history parameters in context.
let natural_history_container = context.get_data_container_mut(NaturalHistory);
natural_history_container.gi_trajectories = inputs.gi_trajectories;
natural_history_container.times = inputs.times;

Ok(())
}

#[derive(Deserialize, Debug)]
struct NaturalHistoryRecord {
id: usize,
#[allow(dead_code)]
time: f64,
gi_cdf: f64,
}

/// Read in a CSV file with an arbitrary number of columns that presumably represent a series.
/// File should have a header, but the header is ignored. Returns a vector of the series (vectors)
/// of type `T`.
/// Read in a natural history input CSV. The CSV should be in long format and describe an
/// individual's natural history over time since they are infected. The CSV should have
/// the following columns (for now, more may be added later as viral load and other natural
/// history parameters are added):
/// - `id`: An identifier demarking that the current parameter set refers to a new sample.
/// - `time`: The time since infection at which the natural history parameters were recorded.
/// - `gi_cdf`: The cumulative distribution function (CDF) of the generation interval at the given time.
fn read_natural_history_inputs(path: &PathBuf) -> Result<NaturalHistoryParameters, IxaError> {
let mut trajectories: Vec<Vec<f64>> = Vec::new();
let mut times: Vec<Vec<f64>> = Vec::new();
let mut gi_trajectories: Vec<Vec<f64>> = Vec::new();
let mut reader = csv::Reader::from_path(path)?;
// Use changes in the `id` column to demark the start of a new parameter set.
let mut current_id: Option<usize> = None;
for result in reader.deserialize() {
let record: NaturalHistoryRecord = result?;
if Some(record.id) == current_id {
// We're still on the same trajectory.
trajectories.last_mut().unwrap().push(record.gi_cdf);
// We're still on the same parameter set.
times.last_mut().unwrap().push(record.time);
gi_trajectories.last_mut().unwrap().push(record.gi_cdf);
} else {
// We're starting a new trajectory.
trajectories.push(vec![record.gi_cdf]);
// We're starting a new parameter set.
times.push(vec![record.time]);
gi_trajectories.push(vec![record.gi_cdf]);
// Set the new id.
current_id = Some(record.id);
}
}
// The way that we've configured the CSV reader means that it will raise errors for an empty row, given that
// the preceding rows had data (it also raises an error for a row with a different number of columns).
// But, if all the rows are empty, it won't raise an error automatically.
if trajectories.is_empty() {
if gi_trajectories.is_empty() {
return Err(IxaError::IxaError(format!(
"No data found in file: {}",
path.display()
)));
}
Ok(NaturalHistoryParameters {
gi_trajectories: trajectories,
times,
gi_trajectories,
})
}

/// A series of checks that ensure that trajectory in a vector of trajectories are a valid CSV.
/// Check that all times in the natural history parameters are non-negative.
fn check_valid_times(times: &[Vec<f64>]) -> Result<(), IxaError> {
if times.iter().any(|x| x.iter().any(|&y| y < 0.0)) {
return Err(IxaError::IxaError(
"Natural history times must be non-negative.".to_string(),
));
}
Ok(())
}

/// A series of checks that ensure that a set of values make a valid CDF. This includes
/// ensuring that the values are in the range [0, 1], that they are strictly increasing,
/// that they do not start at 1.0, and that there are at least two points in the CDF for
/// linear interpolation. If any of these checks fail, an error is raised.
/// The `debug_parameter_name` is used to identify the parameter set in the error message.
/// Expects an array of vectors of potential CDF samples.
fn check_valid_cdf(trajectories: &[Vec<f64>], debug_parameter_name: &str) -> Result<(), IxaError> {
trajectories
.iter()
.enumerate()
.enumerate() // So we can identify which trajectory is bad.
.try_for_each(|(i, x)| -> Result<(), IxaError> {
if x.iter().any(|&y| !(0.0..=1.0).contains(&y)) {
return Err(IxaError::IxaError(format!(
"{debug_parameter_name} CDF trajectory {} contains values not in the range [0, 1].",
"{debug_parameter_name} CDF trajectory {} contains values outside the range [0, 1].",
i + 1
)));
}
if x.windows(2).any(|w| w[0] > w[1]) {
return Err(IxaError::IxaError(format!(
"{debug_parameter_name} CDF trajectory {} is not strictly increasing.",
"{debug_parameter_name} CDF trajectory {} is not increasing.",
i + 1
)));
}
Expand All @@ -110,7 +138,7 @@ fn check_valid_cdf(trajectories: &[Vec<f64>], debug_parameter_name: &str) -> Res
)));
}
// There must be at least two points in the CDF for linear interpolation.
if x.len() < 2 {
if x.get(1).is_none() {
return Err(IxaError::IxaError(format!(
"{debug_parameter_name} CDF trajectory {} has fewer than two points.",
i + 1
Expand Down Expand Up @@ -146,11 +174,11 @@ impl ContextNaturalHistoryExt for Context {
fn set_natural_history_idx(&mut self, person_id: PersonId) {
let num_trajectories = self
.get_data_container(NaturalHistory)
.expect("Natural history data container not initialized.")
.expect("Natural history manager not initialized.")
.gi_trajectories
.len();
let gi_index = self.sample_range(NaturalHistorySamplerRng, 0..num_trajectories);
self.set_person_property(person_id, NaturalHistoryIdx, Some(gi_index));
let index = self.sample_range(NaturalHistorySamplerRng, 0..num_trajectories);
self.set_person_property(person_id, NaturalHistoryIdx, Some(index));
}

fn evaluate_inverse_generation_interval(
Expand All @@ -160,7 +188,7 @@ impl ContextNaturalHistoryExt for Context {
) -> f64 {
// Let's first deal with the corner case -- the person is experiencing their first infection attempt.
// In this case, gi_cdf_value_unif will be 0.0. There are no points below 0.0 in a CDF, so interpolation
// numerically will fairl. Instead, we return 0.0. This is the obvious value because it means that the person
// will fail. Instead, we return 0.0. This is the obvious value because it means that the person
// has experienced none of their infectiousness at the start of their infection. It also ensures that if
// GI CDF is 0.0 for some time after the start of infection, inverse_gi(\epsilon) - inverse_gi(0) = c > 0
// even as \epsilon -> 0, which properly reproduces a CDF where an individual is not infectious immediately.
Expand All @@ -174,18 +202,10 @@ impl ContextNaturalHistoryExt for Context {
let natural_history_container = self
.get_data_container(NaturalHistory)
.expect("Natural history data container not initialized.");
let times = &natural_history_container.times[gi_index];
let gi_trajectory = &natural_history_container.gi_trajectories[gi_index];
// Set up what we need for interpolation.
let dt = self
.get_global_property_value(Parameters)
.unwrap()
.gi_trajectories_dt;
#[allow(clippy::cast_precision_loss)]
let times = (0..gi_trajectory.len())
.map(|x| (x as f64) * dt)
.collect::<Vec<f64>>();
// Because we want to interpolate the *inverse* CDF, the CDF values are "x" and the time values are "y".
interpolate(gi_trajectory, &times, gi_cdf_value_unif)
interpolate(gi_trajectory, times, gi_cdf_value_unif)
}
}

Expand All @@ -199,8 +219,8 @@ fn interpolate(xs: &[f64], ys: &[f64], xp: f64) -> f64 {
// in `xs` are less than `xp`. We have to use an alternative extrapolation strategy.
let upper_window_index = match upper_window_index_option {
None => {
// We are above the range of the `xs` samples, so we must extrapolate. We default to linear extrapolation
// using the last two values in `xs`.
// We are above the range of the `xs` samples, so we must extrapolate. We default to
// linear extrapolation using the last two values in `xs`.
let traj_len = xs.len();
return linear_interpolation(
xs[traj_len - 2],
Expand Down Expand Up @@ -291,7 +311,6 @@ mod test {
max_time: 10.0,
seed: 42,
r_0: 2.5,
gi_trajectories_dt: 0.2,
report_period: 1.0,
synth_population_file: PathBuf::from("."),
natural_history_inputs: PathBuf::from("./tests/data/natural_history.csv"),
Expand Down Expand Up @@ -334,10 +353,10 @@ mod test {
let e = check_valid_cdf(&bad_cdf, "test").err();
match e {
Some(IxaError::IxaError(msg)) => {
assert_eq!(msg, "test CDF trajectory 1 contains values not in the range [0, 1].".to_string());
assert_eq!(msg, "test CDF trajectory 1 contains values outside the range [0, 1].".to_string());
}
Some(ue) => panic!(
"Expected an error that CDF values are not in the range [0, 1]. Instead got {ue}."
"Expected an error that CDF values are outside the range [0, 1]. Instead got {ue}."
),
None => panic!("Expected an error that CDF values are out of range. Instead, CDF validation passed with no error."),
}
Expand All @@ -349,12 +368,12 @@ mod test {
let e = check_valid_cdf(&bad_cdf, "test").err();
match e {
Some(IxaError::IxaError(msg)) => {
assert_eq!(msg, "test CDF trajectory 1 is not strictly increasing.".to_string());
assert_eq!(msg, "test CDF trajectory 1 is not increasing.".to_string());
}
Some(ue) => panic!(
"Expected an error that CDF values are not strictly increasing. Instead got {ue}."
"Expected an error that CDF values are not increasing. Instead got {ue}."
),
None => panic!("Expected an error that CDF values are not strictly increasing. Instead, CDF validation passed with no error."),
None => panic!("Expected an error that CDF values are not increasing. Instead, CDF validation passed with no error."),
}
}

Expand Down Expand Up @@ -407,14 +426,8 @@ mod test {
.gi_trajectories;
assert_eq!(gi_trajectory.len(), 1);
let cdf = |x| Exp::new(1.0).unwrap().cdf(x);
let dt = context
.get_global_property_value(Parameters)
.unwrap()
.gi_trajectories_dt;
#[allow(clippy::cast_precision_loss)]
let expected_trajectory = (0..gi_trajectory[0].len())
.map(|x| cdf(x as f64 * dt))
.collect::<Vec<f64>>();
let times = &context.get_data_container(NaturalHistory).unwrap().times[0];
let expected_trajectory = times.iter().map(|&x| cdf(x)).collect::<Vec<f64>>();
let diff = gi_trajectory[0]
.iter()
.zip(expected_trajectory.iter())
Expand Down Expand Up @@ -520,10 +533,6 @@ mod test {
fn test_evaluate_inverse_generation_interval() {
let mut context = setup();
init(&mut context).unwrap();
let dt = context
.get_global_property_value(Parameters)
.unwrap()
.gi_trajectories_dt;
let person_id = context.add_person(()).unwrap();
context.set_natural_history_idx(person_id);
// Check that a CDF value of 0.0 returns a time of 0.0.
Expand All @@ -535,9 +544,10 @@ mod test {
let cdf = |x| Exp::new(1.0).unwrap().cdf(x);
// No interpolation required because we pick an integer increment of dt.
// But, because the interpolation routine still runs, we can't check for exact equality.
let times = &context.get_data_container(NaturalHistory).unwrap().times[0];
assert_almost_eq!(
context.evaluate_inverse_generation_interval(person_id, cdf(10.0 * dt)),
10.0 * dt,
context.evaluate_inverse_generation_interval(person_id, cdf(times[10])),
times[10],
f64::EPSILON
);
}
Expand Down
2 changes: 0 additions & 2 deletions src/parameters.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ pub struct ParametersValues {
pub max_time: f64,
pub seed: u64,
pub r_0: f64,
pub gi_trajectories_dt: f64,
pub report_period: f64,
pub synth_population_file: PathBuf,
pub natural_history_inputs: PathBuf,
Expand Down Expand Up @@ -40,7 +39,6 @@ mod test {
max_time: 100.0,
seed: 0,
r_0: -1.0,
gi_trajectories_dt: 0.1,
report_period: 1.0,
synth_population_file: PathBuf::from("."),
natural_history_inputs: PathBuf::from("."),
Expand Down
1 change: 0 additions & 1 deletion src/transmission_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,6 @@ mod test {
max_time: 10.0,
seed: 42,
r_0,
gi_trajectories_dt: 0.2,
report_period: 1.0,
synth_population_file: PathBuf::from("."),
natural_history_inputs: PathBuf::from("./tests/data/natural_history.csv"),
Expand Down

0 comments on commit 15272d2

Please sign in to comment.