documentation

CDCgov · Dec 18, 2024 · 15272d2 · 15272d2
1 parent 2174966
commit 15272d2
Show file tree

Hide file tree

Showing 5 changed files with 97 additions and 78 deletions.
diff --git a/docs/natural-history-inputs.md b/docs/natural-history-inputs.md
@@ -1,28 +1,41 @@
 # Natural history model inputs
 
-## Infectiousness over time
+## Overview
+We provide a way for reading in a user-specified natural history parameters (infectiousness over time
+or the generation interval, viral load over time, etc.). This CSV can be expanded to also include
+symptom onset and improvement times for a given natural history parameter set.
 
-We provide a way for reading in a user-specified infectiousness over time distribution (generation interval)
-and appropriately scheduling infection attempts based on the distribution. The user provides an input file
-that contains samples from the cumulative distribution function (CDF) of the generation interval (GI) over
-time at a specified $\Delta t$, describing the fraction of an individual's infectiousness that has passed
-by a given time. The input data are assumed to have a format where the columns represent the times since
-the infection attempt (so starting at $t = 0$) and the entries in each row describe the value of the GI
-CDF. Each row represents a potential trajectory of the GI CDF.
+By specifying all of these parameters from a CSV file, the user can provide any natural history parameters
+they want in a very flexible fashion. For instance, if natural history parameters are correlated (i.e.,
+generation interval and symptom improvement), this can be modeled by providing a joint parameter set
+in the CSV. Comparatively, if the parameters are uncorrelated, that can also be modeled by just having
+the CSV inputs be independent draws from a distribution.
+
+## Data input format
+
+Data are input in a long format. Columns include `id`, `time`, and `gi_cdf`. Future work includes expanding
+to include `viral_load`, `symptom_onset_time`, and `symptom_improvement_time`. Each `id` refers to a distinct
+sample from the natural history parameters at some `time` since the person is first infected. The `gi_cdf`
+column describes the fraction of infectiousness that has occured at a given `time` for a given parameter set.
+
+## Implementation
 
 People are assigned a trajectory number (row number) when they are infected. This allows for each person
-to have a different GI CDF if each of the trajectories are different. However, that trajectory number will
+to have a different GI CDF if each of the trajectories are different. That trajectory number will
 be used for also drawing the person's other natural history characteristics, such as their symptom onset
 and improvement times or viral load trajectory. This allows easily encoding correlation between natural
-history parameters (the user provides input CSVs where the first row in each CSV is from a joint sample
-of GI, symptom onset, symptom improvement, etc.) or allowing each of the parameters to be independent.
+history parameters (the user provides input CSVs where the various values are all a joint sample
+of natural history parameters.) or allowing each of the parameters to be independent.
 
-## Overall Assumptions
+## Assumptions
 1. There are no requirements on the number of trajectories fed to the model. Trajectory numbers are assigned
-to people uniformly and randomly. However, this means that an individual who is reinfected could have the exact
-same infectiousness trajectory as their last infection.
-2. There must be the same number of parameter sets for each parameter provided as an input CSV. For now, we are focusing
-only on GI, but we will soon expand our work to also include symptom onset and symptom improvement times.
-3. We have not yet crossed the barrier of how to separately treat individuals who are asymptomatic only. Are their
-GIs drawn from a separate CSV? Should their $R_i$ just be multiplied by a scalar? Part of the reason we are deferring
-this decision is because our previous isolation guidance work focused only on symptomatic individuals.
+to people uniformly and randomly. A user must provide enough trajectories that they provide a representative
+sample of the underlying natural history parameters.
+2. There must be the same number of values for each parameter provided in the input CSV. In other words, a user
+cannot provide 1000 GI trajectories but only 10 symptom improvement times. The user must ensure that all parameter
+sets are complete and do that either via assuming independent draws between parameter values or imposing a correlation.
+3. The current input structure lends itself to basically encoding the agent's history of disease as an input parameter.
+Is this a good idea? Is this putting too much burden on the user when there are things that could be done in Rust instead?
+Imagine an agent is asymptomatic and they have a different GI, the natural history CSV file needs to include both of those
+pieces of information to ensure the agent is properly modeled in the simulation. So, the user needs to figure out how to tie
+together clinical symptoms with natural history, and the model just simulates whatever correlations a user describes.
diff --git a/input/input.json b/input/input.json
@@ -3,7 +3,6 @@
       "max_time": 200.0,
       "seed": 123,
       "r_0": 2.5,
-      "gi_trajectories_dt": 0.02,
       "report_period": 1.0,
       "synth_population_file": "input/people_test.csv",
       "natural_history_inputs": "input/natural_history.csv"

diff --git a/src/natural_history_manager.rs b/src/natural_history_manager.rs
@@ -13,6 +13,7 @@ define_rng!(NaturalHistorySamplerRng);
 
 #[derive(Default, Debug)]
 struct NaturalHistoryParameters {
+    times: Vec<Vec<f64>>,
     gi_trajectories: Vec<Vec<f64>>,
 }
 
@@ -22,82 +23,109 @@ define_data_plugin!(
     NaturalHistoryParameters::default()
 );
 
-/// Read in input natural history parameters from CSVs, validate them as valid inputs,
-/// and store them for later querying through the `ContextNaturalHistoryExt` trait.
+/// Read in input natural history parameters from CSVs, validate them as valid natural
+/// history parameters, and store them for later querying through the
+/// `ContextNaturalHistoryExt` trait.
 pub fn init(context: &mut Context) -> Result<(), IxaError> {
-    // Read in the generation interval trajectories from a CSV file.
+    // Read in the natural history parameter distributions from a CSV file.
     let path = &context
         .get_global_property_value(Parameters)
         .unwrap()
         .natural_history_inputs;
     let inputs = read_natural_history_inputs(path)?;
     // Check that the natural history inputs are valid.
+    // Are all the times positive? (They should be times *since* infection).
+    check_valid_times(&inputs.times)?;
     // Do each of the GI trajectories make up a valid CDF?
     check_valid_cdf(&inputs.gi_trajectories, "GI")?;
 
     // If all checks have passed, store the natural history parameters in context.
     let natural_history_container = context.get_data_container_mut(NaturalHistory);
     natural_history_container.gi_trajectories = inputs.gi_trajectories;
+    natural_history_container.times = inputs.times;
 
     Ok(())
 }
 
 #[derive(Deserialize, Debug)]
 struct NaturalHistoryRecord {
     id: usize,
-    #[allow(dead_code)]
     time: f64,
     gi_cdf: f64,
 }
 
-/// Read in a CSV file with an arbitrary number of columns that presumably represent a series.
-/// File should have a header, but the header is ignored. Returns a vector of the series (vectors)
-/// of type `T`.
+/// Read in a natural history input CSV. The CSV should be in long format and describe an
+/// individual's natural history over time since they are infected. The CSV should have
+/// the following columns (for now, more may be added later as viral load and other natural
+/// history parameters are added):
+/// - `id`: An identifier demarking that the current parameter set refers to a new sample.
+/// - `time`: The time since infection at which the natural history parameters were recorded.
+/// - `gi_cdf`: The cumulative distribution function (CDF) of the generation interval at the given time.
 fn read_natural_history_inputs(path: &PathBuf) -> Result<NaturalHistoryParameters, IxaError> {
-    let mut trajectories: Vec<Vec<f64>> = Vec::new();
+    let mut times: Vec<Vec<f64>> = Vec::new();
+    let mut gi_trajectories: Vec<Vec<f64>> = Vec::new();
     let mut reader = csv::Reader::from_path(path)?;
+    // Use changes in the `id` column to demark the start of a new parameter set.
     let mut current_id: Option<usize> = None;
     for result in reader.deserialize() {
         let record: NaturalHistoryRecord = result?;
         if Some(record.id) == current_id {
-            // We're still on the same trajectory.
-            trajectories.last_mut().unwrap().push(record.gi_cdf);
+            // We're still on the same parameter set.
+            times.last_mut().unwrap().push(record.time);
+            gi_trajectories.last_mut().unwrap().push(record.gi_cdf);
         } else {
-            // We're starting a new trajectory.
-            trajectories.push(vec![record.gi_cdf]);
+            // We're starting a new parameter set.
+            times.push(vec![record.time]);
+            gi_trajectories.push(vec![record.gi_cdf]);
             // Set the new id.
             current_id = Some(record.id);
         }
     }
     // The way that we've configured the CSV reader means that it will raise errors for an empty row, given that
     // the preceding rows had data (it also raises an error for a row with a different number of columns).
     // But, if all the rows are empty, it won't raise an error automatically.
-    if trajectories.is_empty() {
+    if gi_trajectories.is_empty() {
         return Err(IxaError::IxaError(format!(
             "No data found in file: {}",
             path.display()
         )));
     }
     Ok(NaturalHistoryParameters {
-        gi_trajectories: trajectories,
+        times,
+        gi_trajectories,
     })
 }
 
-/// A series of checks that ensure that trajectory in a vector of trajectories are a valid CSV.
+/// Check that all times in the natural history parameters are non-negative.
+fn check_valid_times(times: &[Vec<f64>]) -> Result<(), IxaError> {
+    if times.iter().any(|x| x.iter().any(|&y| y < 0.0)) {
+        return Err(IxaError::IxaError(
+            "Natural history times must be non-negative.".to_string(),
+        ));
+    }
+    Ok(())
+}
+
+/// A series of checks that ensure that a set of values make a valid CDF. This includes
+/// ensuring that the values are in the range [0, 1], that they are strictly increasing,
+/// that they do not start at 1.0, and that there are at least two points in the CDF for
+/// linear interpolation. If any of these checks fail, an error is raised.
+/// The `debug_parameter_name` is used to identify the parameter set in the error message.
+/// Expects an array of vectors of potential CDF samples.
 fn check_valid_cdf(trajectories: &[Vec<f64>], debug_parameter_name: &str) -> Result<(), IxaError> {
     trajectories
         .iter()
-        .enumerate()
+        .enumerate() // So we can identify which trajectory is bad.
         .try_for_each(|(i, x)| -> Result<(), IxaError> {
             if x.iter().any(|&y| !(0.0..=1.0).contains(&y)) {
                 return Err(IxaError::IxaError(format!(
-                    "{debug_parameter_name} CDF trajectory {} contains values not in the range [0, 1].",
+                    "{debug_parameter_name} CDF trajectory {} contains values outside the range [0, 1].",
                     i + 1
                 )));
             }
             if x.windows(2).any(|w| w[0] > w[1]) {
                 return Err(IxaError::IxaError(format!(
-                    "{debug_parameter_name} CDF trajectory {} is not strictly increasing.",
+                    "{debug_parameter_name} CDF trajectory {} is not increasing.",
                     i + 1
                 )));
             }
@@ -110,7 +138,7 @@ fn check_valid_cdf(trajectories: &[Vec<f64>], debug_parameter_name: &str) -> Res
                 )));
             }
             // There must be at least two points in the CDF for linear interpolation.
-            if x.len() < 2 {
+            if x.get(1).is_none() {
                 return Err(IxaError::IxaError(format!(
                     "{debug_parameter_name} CDF trajectory {} has fewer than two points.",
                     i + 1
@@ -146,11 +174,11 @@ impl ContextNaturalHistoryExt for Context {
     fn set_natural_history_idx(&mut self, person_id: PersonId) {
         let num_trajectories = self
             .get_data_container(NaturalHistory)
-            .expect("Natural history data container not initialized.")
+            .expect("Natural history manager not initialized.")
             .gi_trajectories
             .len();
-        let gi_index = self.sample_range(NaturalHistorySamplerRng, 0..num_trajectories);
-        self.set_person_property(person_id, NaturalHistoryIdx, Some(gi_index));
+        let index = self.sample_range(NaturalHistorySamplerRng, 0..num_trajectories);
+        self.set_person_property(person_id, NaturalHistoryIdx, Some(index));
     }
 
     fn evaluate_inverse_generation_interval(
@@ -160,7 +188,7 @@ impl ContextNaturalHistoryExt for Context {
     ) -> f64 {
         // Let's first deal with the corner case -- the person is experiencing their first infection attempt.
         // In this case, gi_cdf_value_unif will be 0.0. There are no points below 0.0 in a CDF, so interpolation
-        // numerically will fairl. Instead, we return 0.0. This is the obvious value because it means that the person
+        // will fail. Instead, we return 0.0. This is the obvious value because it means that the person
         // has experienced none of their infectiousness at the start of their infection. It also ensures that if
         // GI CDF is 0.0 for some time after the start of infection, inverse_gi(\epsilon) - inverse_gi(0) = c > 0
         // even as \epsilon -> 0, which properly reproduces a CDF where an individual is not infectious immediately.
@@ -174,18 +202,10 @@ impl ContextNaturalHistoryExt for Context {
         let natural_history_container = self
             .get_data_container(NaturalHistory)
             .expect("Natural history data container not initialized.");
+        let times = &natural_history_container.times[gi_index];
         let gi_trajectory = &natural_history_container.gi_trajectories[gi_index];
-        // Set up what we need for interpolation.
-        let dt = self
-            .get_global_property_value(Parameters)
-            .unwrap()
-            .gi_trajectories_dt;
-        #[allow(clippy::cast_precision_loss)]
-        let times = (0..gi_trajectory.len())
-            .map(|x| (x as f64) * dt)
-            .collect::<Vec<f64>>();
         // Because we want to interpolate the *inverse* CDF, the CDF values are "x" and the time values are "y".
-        interpolate(gi_trajectory, &times, gi_cdf_value_unif)
+        interpolate(gi_trajectory, times, gi_cdf_value_unif)
     }
 }
 
@@ -199,8 +219,8 @@ fn interpolate(xs: &[f64], ys: &[f64], xp: f64) -> f64 {
     // in `xs` are less than `xp`. We have to use an alternative extrapolation strategy.
     let upper_window_index = match upper_window_index_option {
         None => {
-            // We are above the range of the `xs` samples, so we must extrapolate. We default to linear extrapolation
-            // using the last two values in `xs`.
+            // We are above the range of the `xs` samples, so we must extrapolate. We default to
+            // linear extrapolation using the last two values in `xs`.
             let traj_len = xs.len();
             return linear_interpolation(
                 xs[traj_len - 2],
@@ -291,7 +311,6 @@ mod test {
             max_time: 10.0,
             seed: 42,
             r_0: 2.5,
-            gi_trajectories_dt: 0.2,
             report_period: 1.0,
             synth_population_file: PathBuf::from("."),
             natural_history_inputs: PathBuf::from("./tests/data/natural_history.csv"),
@@ -334,10 +353,10 @@ mod test {
         let e = check_valid_cdf(&bad_cdf, "test").err();
         match e {
             Some(IxaError::IxaError(msg)) => {
-                assert_eq!(msg, "test CDF trajectory 1 contains values not in the range [0, 1].".to_string());
+                assert_eq!(msg, "test CDF trajectory 1 contains values outside the range [0, 1].".to_string());
             }
             Some(ue) => panic!(
-                "Expected an error that CDF values are not in the range [0, 1]. Instead got {ue}."
+                "Expected an error that CDF values are outside the range [0, 1]. Instead got {ue}."
             ),
             None => panic!("Expected an error that CDF values are out of range. Instead, CDF validation passed with no error."),
         }
@@ -349,12 +368,12 @@ mod test {
         let e = check_valid_cdf(&bad_cdf, "test").err();
         match e {
             Some(IxaError::IxaError(msg)) => {
-                assert_eq!(msg, "test CDF trajectory 1 is not strictly increasing.".to_string());
+                assert_eq!(msg, "test CDF trajectory 1 is not increasing.".to_string());
             }
             Some(ue) => panic!(
-                "Expected an error that CDF values are not strictly increasing. Instead got {ue}."
+                "Expected an error that CDF values are not increasing. Instead got {ue}."
             ),
-            None => panic!("Expected an error that CDF values are not strictly increasing. Instead, CDF validation passed with no error."),
+            None => panic!("Expected an error that CDF values are not increasing. Instead, CDF validation passed with no error."),
         }
     }
 
@@ -407,14 +426,8 @@ mod test {
             .gi_trajectories;
         assert_eq!(gi_trajectory.len(), 1);
         let cdf = |x| Exp::new(1.0).unwrap().cdf(x);
-        let dt = context
-            .get_global_property_value(Parameters)
-            .unwrap()
-            .gi_trajectories_dt;
-        #[allow(clippy::cast_precision_loss)]
-        let expected_trajectory = (0..gi_trajectory[0].len())
-            .map(|x| cdf(x as f64 * dt))
-            .collect::<Vec<f64>>();
+        let times = &context.get_data_container(NaturalHistory).unwrap().times[0];
+        let expected_trajectory = times.iter().map(|&x| cdf(x)).collect::<Vec<f64>>();
         let diff = gi_trajectory[0]
             .iter()
             .zip(expected_trajectory.iter())
@@ -520,10 +533,6 @@ mod test {
     fn test_evaluate_inverse_generation_interval() {
         let mut context = setup();
         init(&mut context).unwrap();
-        let dt = context
-            .get_global_property_value(Parameters)
-            .unwrap()
-            .gi_trajectories_dt;
         let person_id = context.add_person(()).unwrap();
         context.set_natural_history_idx(person_id);
         // Check that a CDF value of 0.0 returns a time of 0.0.
@@ -535,9 +544,10 @@ mod test {
         let cdf = |x| Exp::new(1.0).unwrap().cdf(x);
         // No interpolation required because we pick an integer increment of dt.
         // But, because the interpolation routine still runs, we can't check for exact equality.
+        let times = &context.get_data_container(NaturalHistory).unwrap().times[0];
         assert_almost_eq!(
-            context.evaluate_inverse_generation_interval(person_id, cdf(10.0 * dt)),
-            10.0 * dt,
+            context.evaluate_inverse_generation_interval(person_id, cdf(times[10])),
+            times[10],
             f64::EPSILON
         );
     }

diff --git a/src/parameters.rs b/src/parameters.rs
@@ -8,7 +8,6 @@ pub struct ParametersValues {
     pub max_time: f64,
     pub seed: u64,
     pub r_0: f64,
-    pub gi_trajectories_dt: f64,
     pub report_period: f64,
     pub synth_population_file: PathBuf,
     pub natural_history_inputs: PathBuf,
@@ -40,7 +39,6 @@ mod test {
             max_time: 100.0,
             seed: 0,
             r_0: -1.0,
-            gi_trajectories_dt: 0.1,
             report_period: 1.0,
             synth_population_file: PathBuf::from("."),
             natural_history_inputs: PathBuf::from("."),

diff --git a/src/transmission_manager.rs b/src/transmission_manager.rs
@@ -233,7 +233,6 @@ mod test {
             max_time: 10.0,
             seed: 42,
             r_0,
-            gi_trajectories_dt: 0.2,
             report_period: 1.0,
             synth_population_file: PathBuf::from("."),
             natural_history_inputs: PathBuf::from("./tests/data/natural_history.csv"),