-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describing our approach to modeling an arbitrary infectiousness distribution #19
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
ca39441
motivation section
ChiragKumar9 2b5fbef
checkpoint because things crash
ChiragKumar9 ae863e5
full draft
ChiragKumar9 114f567
revised for clarity and typos
ChiragKumar9 784ccfe
emphasize sequential sampling more
ChiragKumar9 b47a22a
add uniform shifting math
ChiragKumar9 dd0c628
clarify text
ChiragKumar9 1e3d5ea
make clear that you need to schedule in elpased time forward
ChiragKumar9 5109918
use programmer's notation to trick github md
ChiragKumar9 8a91d09
added to contents
ChiragKumar9 99d7f37
notation
ChiragKumar9 f6d9e94
review comments
ChiragKumar9 a0f7af9
make more concise
ChiragKumar9 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Table of contents | ||
|
||
1. Synthetic population | ||
2. [Transmission](transmission.md) | ||
- [Time-varying infectiousness](time-varying-infectiousness.md) | ||
- Estimating infectiousness from viral load | ||
3. Contact | ||
4. Natural immunity | ||
5. Interventions | ||
- Mask wearing | ||
- Isolation | ||
- Vaccines |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
# Time-varying infectiousness | ||
|
||
## Motivation | ||
|
||
Many epidemiological studies have shown that infectiousness varies widely over the course | ||
of the individual's infection. We detail the process for sampling an arbitrary number of | ||
infection attempts appropriately from an arbitrary infectiousness curve in an agent-based model (ABM). | ||
We describe an approach that provides great generalizability, enabling sampling when both the | ||
underlying contact rate and infectiousness distribution change over the course of an infection. | ||
We motivate our discussion with a simple example of an infectious person who takes an | ||
antiviral partially through their infection and is hospitalized. | ||
|
||
## Assumptions | ||
|
||
1. We describe the approach for sampling from an arbitrary infectiousness dsitribution for the uniform | ||
distribution from zero to one -- $\mathcal{U}(0, 1)$. | ||
- Using [inverse transform sampling](https://en.wikipedia.org/wiki/inverse_transform_sampling), a | ||
uniformly-distributed random variable can be transformed to any distribution by passing samples through | ||
the inverse cumulative distribution function (CDF). | ||
- This requires that the infectiousness over time distribution has an inverse CDF or we are | ||
able to approximate it (i.e., from empirical data). | ||
2. Infectiousness over time is a relative quantity, so it is completely separate from the absolute degree | ||
to which someone is infectious, represented by the total number of secondary infection attempts they have. | ||
That quantity, denoted $C_i$ in the generality, is directly related to $R_0$. | ||
3. We require $C_i$ ordered draws from the infectiousness over time/generation interval distribution (time | ||
from an individual becoming infected to their secondary contacts becoming infected). We schedule infection | ||
attempts at these drawn times. | ||
|
||
## Overall Methodology | ||
|
||
We could just take $C_i$ draws from the generation interval (GI) and order them, scheduling an infection | ||
attempt at each time drawn. However, we argue (in detail [below](#why-do-we-need-order-statistics)) that | ||
it is more robust to draw the minimum of our $C_i$ draws, evaluate an infection attempt at that time, | ||
and _then_ schedule the next infection attempt -- drawing the second smallest of our $C_i$ ordered draws | ||
from the GI. | ||
|
||
Drawing the smallest, the second smallest, the third smallest, etc. of a set number of random draws | ||
from a distribution is the problem of having the distribution's | ||
[order statistics](https://en.wikipedia.org/wiki/Order_statistic)", and it is readily solvable for the | ||
case of the uniform distribution. | ||
|
||
## Drawing ordered samples from a uniform distribution | ||
|
||
Let us begin by considering the minimum of $n$ draws from a uniform distribution. We denote the sorted | ||
first of these draws as $X_{(1)}$, the sorted second as $X_{(2)}$, etc. Note that this is different from $X_1$ | ||
which is the unsorted first random draw from the distribution, unsorted. We are interested in the distribution of | ||
$X_{(1)}$. Let us consider the cumulative distribution function of $X_{(1)}$ since it is a continuous random variable. | ||
|
||
$$\mathbb{P} \\{X_{(1)} \leq x \\}$$ | ||
|
||
We know that if the minimum is below some value, $x$, that could mean that just the minimum is below $x$ or that | ||
all $n$ of our random samples are below the minimum. There are too many values to enumerate, so let us consider | ||
the opposite instead -- that $X_{(1)}$ is greater than some value $x$. | ||
|
||
$$\mathbb{P} \\{X_{(1)} \leq x \\} = 1 - \mathbb{P} \\{X_{(1)} > x \\}$$ | ||
|
||
In this case, if $X_{(1)} > x$, we know that all of sampled values are at least above $x$. Recall that | ||
each of the samples is independent and identically-distributed. | ||
|
||
$$= 1 - \mathbb{P} \\{X_i > x \\}^n$$ | ||
|
||
$$= 1 - (1 - F_X(x))^n$$ | ||
|
||
For a uniform distribution: | ||
|
||
$$= 1 - (1 - x)^n$$ | ||
|
||
This is the CDF for a Beta distribution with $alpha = 1$ and $beta = n$. More generally, the distribution | ||
of the $k$th infection attempt from $n$ total infection attempts is $\beta(k, n - 1 + k)$. | ||
|
||
However, we cannot just independently draw the time of a next infection attempt from the corresponding Beta | ||
distribution. Consider that we draw the first infection attempt time from $\beta(1, 5)$ and happen to get a large value. | ||
We need to take that into account when drawing the second infection attempt time. The second infection | ||
attempt time must be greater than the first, so we cannot just draw from $\beta(2, 6)$ but rather $\beta(2, 6) | x_{(1)}$. | ||
|
||
We can set $x_{(1)}$ as the minimum of a new uniform | ||
distribution, $\mathcal{U}(x_{(1)}, 1)$, from which we need to draw an infection attempt. Because this is a new distribution, | ||
we want the first of $n - 1$ infection attempt times on this distribution. We can do that by drawing the | ||
minimum of $n - 1$ infection attempts from $\mathcal{U}(0, 1)$, and scaling that value to be on $(x_{(1)}, 1)$. | ||
In other words, we shrink the available uniform distribution with each infection attempt and ask what | ||
would be the _next_ infection attempt on that distribution. Concretely, if $x_{(1)}$ is | ||
on $(0, 1)$, and $x_{(2)}$ is also on $(0, 1)$, we use the following equation: | ||
|
||
$$x_{(2)} := x_{(1)} + x_{(2)} * (1 - x_{(1)})$$ | ||
|
||
For further detail, the Wikipedia [page on order statistics](https://en.wikipedia.org/wiki/Order_statistic) is a | ||
great reference as are various notes on the subject from | ||
[advanced statistics courses](https://colorado.edu/amath/sites/default/files/attached-files/order_stats.pdf). | ||
|
||
## Workflow | ||
|
||
1. Draw a number of secondary infection attempts for the agent, $C_i$. This can be equal to $R_i$ if the sampling times | ||
are from the generation interval exactly, or $C_i$ can be greater than $R_i$ if sampling from a | ||
[proposal distribution](#rejection-sampling-from-an-arbitrary-generation-interval) and rejecting some attempts to leave | ||
a total of $R_i$ infection attempts. | ||
2. Draw the time for the first of $n$ remaining infection attempts of $\mathcal{U}(0, 1)$ by taking a random value from | ||
$\beta(1, m)$. $m = C_i - $ (the number of infection attempts that have occured). | ||
3. Scale the value on $\mathcal{U}(0, 1)$ to be on $\mathcal{U}(x_{(i)}, 1)$ where $x_{(i)}$ is the previous draw. | ||
4. Convert the uniform value to generation interval space by passing it through the inverse CDF of the generation interval, | ||
and schedule the next infection attempt at the specified time. Wait until that time has occured in the simulation before | ||
proceeding. | ||
|
||
The result of passing the uniform time through the GI's inverse CDF is the time _since_ the agent first become | ||
infectious at which the given $n$th infection attempt occurs. | ||
|
||
5. Repeat from step two until $m = 0$. | ||
|
||
## Why do we need order statistics? | ||
|
||
We provide a detailed justification of why scheduling the next infection attempt after the last (i.e., sequentially | ||
drawing infection attempt times from the GI) is the better approach compared to taking all the draws | ||
from the GI at the beginning of an individual's infectiousness period and at once scheduling the infection | ||
attempts based on those times. | ||
|
||
### Changes in the number of infection attempts in the middle of an agent's infectious course | ||
|
||
Imagine an agent dies while they are still infectious. They can no longer infect others. (Or, if the | ||
disease in question is Ebola, there are well-defined processes by which they may still be infectious and | ||
those processes should not be lumped with the infection generation interval.) | ||
|
||
To ensure a dead person does not infect others, we would like to remove the plans where they infect others. | ||
If we had pre-scheduled all infections, we would have had to store the plan IDs for all the infections in | ||
`HashMap<PersonId, Vec<PlanId>>`. Then, each time we had executed one of these plans, we would have to | ||
remove it from the vector, so that the entry for each `PersonId` tells us the plans we have _left_ for a given | ||
person. Then, when the agent dies, we could iterate through the remaining plans and cancel them. | ||
|
||
Clearly, this is cumbersome, requiring us to store a `HashMap` and iterate | ||
through a vector every time an infection attempt occurs. Alternatively, if we scheduled infection events | ||
sequentially, we would only have to store the next infection attempt time. We could instead use a | ||
`HashMap<PersonId, PlanId>`, removing the need to iterate through a vector and instead enabling us to | ||
directly use the `.insert` method on the `HashMap`. | ||
|
||
More generally, this idea applies to any case where we need to change the number of infection attempts | ||
partway through an infection course. Sequentially scheduling the attempts makes it possible to accomodate | ||
changes to the number of infection attempts that may happen in the middle of an infection course. | ||
|
||
Why not just check whether the agent is alive or not at the beginning of the infection attempt? If they are | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I honestly think we could remove this entire paragraph for conciseness. |
||
not alive, skip the infection attempt. There are two reasons this is not the desired solution. First, | ||
it is cleaner to handle removing an agent at the point at which they become no longer relevant rather than | ||
continuously checking on whether they are alive. Secondly, there are other cases when the number | ||
of infection attempts may change partially through an infection, such as being hospitalized and having | ||
a lower contact rate. Sequentially scheduling infection attempts enables arbitrary changes to the | ||
effective contact rate while an agent is infectious in the simplest way. | ||
|
||
### Rejection sampling from an arbitrary generation interval | ||
|
||
Let us consider a case where the infectiousness distribution changes over time. Concretely, imagine that | ||
infectiousness is a function of the viral load, and an agent gets an antiviral partially through their | ||
infection course. Depending on when the agent gets the antiviral, they have a different reduction in viral | ||
load. Therefore, even if we know that an agent will get an antiviral, we don't know their infectiousness | ||
distribution in advance. | ||
|
||
It is still straightforward to sample the appropriate infectiousness distribution without needing to change | ||
the generation interval. We can use [rejection sampling](https://en.wikipedia.org/wiki/Rejection_sampling). | ||
In this case, we still sample infection attempt times from the pre-antiviral infectiousness distribution, and | ||
we accept those sampled times as actual infection attempts with probability $a(t) / g(t)$ where $a(t)$ is the | ||
post-antiviral infectiousness distribution and $g(t)$ is the underlying pre-antiviral infectiousness | ||
distribution. | ||
|
||
However, rejection sampling is inherently inefficient. It requires that we draw plans at a faster rate than events | ||
actually happen, and then we need to evaluate at the time of the plan whether we mean for something to actually | ||
happen. Nevertheless, there are ways to improve efficiency. The trivial case of drawing plans at a faster rate | ||
than events actually happen is to use the maximum event rate as the sampling rate for all $t$. But, if the | ||
actual event rate is much smaller than the maximum for nearly all $t$, (i.e., a disease where infectiousness is highly | ||
concentrated around a specific time or $a(t) << g(t)$), this is particularly inefficient because we are rejecting the majority of samples. Instead, we may try making our proposal distribution better fit our underlying distribution by making it, say, a linear approximation of $a(t)$. | ||
|
||
This approach is only possible if we sequentially sample infection attempts. If we sample all | ||
infection attempts at the beginning of an agent's infection, we are locked into using a singular sampling rate. Thus, | ||
sequential sampling enables us to most generally model a generation interval or infectiousness distribution | ||
that may change over the course of an agent's infection in a non pre-defined way. | ||
|
||
These examples underscore that sequentially sampling infection attempts rather than pre-scheduling enables | ||
the modeler to consider changes in both the contact rate and the generation interval without needing to change | ||
the transmission model. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this changes your basic argument, but this isn't the only implementation choice.
Instead, you can store the times of the next infection, rather than have them as plans, and then just plan the next one. This removes the need to cancel, and if you keep them in a sorted list, then you also don't need to iterate over to plan the next one.
As for the storage, that depends very much on the data structure. For instance, Person Properties are stored as a HashMap of Vecs of PersonProperty, so you actually need an item for each person, whether they have scheduled plans or not, so the only real additional cost is the Vec of times itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had not thought of this approach and appreciate this insight for now revising this section to drive home the main points concisely.