Skip to content

Commit

Permalink
switch to long input format
Browse files Browse the repository at this point in the history
  • Loading branch information
ChiragKumar9 committed Dec 18, 2024
1 parent 88c6b04 commit db76734
Show file tree
Hide file tree
Showing 14 changed files with 4,386 additions and 172 deletions.
6 changes: 2 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,9 @@
# !your_data_file.csv
# !your_data_directory/
!input/people_test.csv
!input/gi_trajectories.csv
!tests/data/gi_trajectory.csv
!tests/data/three_columns.csv
!input/natural_history.csv
!tests/data/natural_history.csv
!tests/data/empty.csv
!tests/data/one_column.csv
!tests/data/column_size_changes.csv

#####
Expand Down
51 changes: 32 additions & 19 deletions docs/natural-history-inputs.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,41 @@
# Natural history model inputs

## Infectiousness over time
## Overview
We provide a way for reading in a user-specified natural history parameters (infectiousness over time
or the generation interval, viral load over time, etc.). This CSV can be expanded to also include
symptom onset and improvement times for a given natural history parameter set.

We provide a way for reading in a user-specified infectiousness over time distribution (generation interval)
and appropriately scheduling infection attempts based on the distribution. The user provides an input file
that contains samples from the cumulative distribution function (CDF) of the generation interval (GI) over
time at a specified $\Delta t$, describing the fraction of an individual's infectiousness that has passed
by a given time. The input data are assumed to have a format where the columns represent the times since
the infection attempt (so starting at $t = 0$) and the entries in each row describe the value of the GI
CDF. Each row represents a potential trajectory of the GI CDF.
By specifying all of these parameters from a CSV file, the user can provide any natural history parameters
they want in a very flexible fashion. For instance, if natural history parameters are correlated (i.e.,
generation interval and symptom improvement), this can be modeled by providing a joint parameter set
in the CSV. Comparatively, if the parameters are uncorrelated, that can also be modeled by just having
the CSV inputs be independent draws from a distribution.

## Data input format

Data are input in a long format. Columns include `id`, `time`, and `gi_cdf`. Future work includes expanding
to include `viral_load`, `symptom_onset_time`, and `symptom_improvement_time`. Each `id` refers to a distinct
sample from the natural history parameters at some `time` since the person is first infected. The `gi_cdf`
column describes the fraction of infectiousness that has occured at a given `time` for a given parameter set.

## Implementation

People are assigned a trajectory number (row number) when they are infected. This allows for each person
to have a different GI CDF if each of the trajectories are different. However, that trajectory number will
to have a different GI CDF if each of the trajectories are different. That trajectory number will
be used for also drawing the person's other natural history characteristics, such as their symptom onset
and improvement times or viral load trajectory. This allows easily encoding correlation between natural
history parameters (the user provides input CSVs where the first row in each CSV is from a joint sample
of GI, symptom onset, symptom improvement, etc.) or allowing each of the parameters to be independent.
history parameters (the user provides input CSVs where the various values are all a joint sample
of natural history parameters.) or allowing each of the parameters to be independent.

## Overall Assumptions
## Assumptions
1. There are no requirements on the number of trajectories fed to the model. Trajectory numbers are assigned
to people uniformly and randomly. However, this means that an individual who is reinfected could have the exact
same infectiousness trajectory as their last infection.
2. There must be the same number of parameter sets for each parameter provided as an input CSV. For now, we are focusing
only on GI, but we will soon expand our work to also include symptom onset and symptom improvement times.
3. We have not yet crossed the barrier of how to separately treat individuals who are asymptomatic only. Are their
GIs drawn from a separate CSV? Should their $R_i$ just be multiplied by a scalar? Part of the reason we are deferring
this decision is because our previous isolation guidance work focused only on symptomatic individuals.
to people uniformly and randomly. A user must provide enough trajectories that they provide a representative
sample of the underlying natural history parameters.
2. There must be the same number of values for each parameter provided in the input CSV. In other words, a user
cannot provide 1000 GI trajectories but only 10 symptom improvement times. The user must ensure that all parameter
sets are complete and do that either via assuming independent draws between parameter values or imposing a correlation.
3. The current input structure lends itself to basically encoding the agent's history of disease as an input parameter.
Is this a good idea? Is this putting too much burden on the user when there are things that could be done in Rust instead?
Imagine an agent is asymptomatic and they have a different GI, the natural history CSV file needs to include both of those
pieces of information to ensure the agent is properly modeled in the simulation. So, the user needs to figure out how to tie
together clinical symptoms with natural history, and the model just simulates whatever correlations a user describes.
21 changes: 0 additions & 21 deletions input/gi_trajectories.csv

This file was deleted.

3 changes: 1 addition & 2 deletions input/input.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
"max_time": 200.0,
"seed": 123,
"r_0": 2.5,
"gi_trajectories_dt": 0.02,
"report_period": 1.0,
"synth_population_file": "input/people_test.csv",
"gi_trajectories_file": "input/gi_trajectories.csv"
"natural_history_inputs": "input/natural_history.csv"
}
}
Loading

0 comments on commit db76734

Please sign in to comment.