diff --git a/docs/0 contents.md b/docs/0 contents.md new file mode 100644 index 0000000..4a512ef --- /dev/null +++ b/docs/0 contents.md @@ -0,0 +1,12 @@ +# Table of contents + +1. Synthetic population +2. [Transmission](transmission.md) + - [Time-varying infectiousness](time-varying-infectiousness.md) + - Estimating infectiousness from viral load +3. Contact +4. Natural immunity +5. Interventions + - Mask wearing + - Isolation + - Vaccines diff --git a/docs/time-varying-infectiousness.md b/docs/time-varying-infectiousness.md new file mode 100644 index 0000000..eabad66 --- /dev/null +++ b/docs/time-varying-infectiousness.md @@ -0,0 +1,174 @@ +# Time-varying infectiousness + +## Motivation + +Many epidemiological studies have shown that infectiousness varies widely over the course +of the individual's infection. We detail the process for sampling an arbitrary number of +infection attempts appropriately from an arbitrary infectiousness curve in an agent-based model (ABM). +We describe an approach that provides great generalizability, enabling sampling when both the +underlying contact rate and infectiousness distribution change over the course of an infection. +We motivate our discussion with a simple example of an infectious person who takes an +antiviral partially through their infection and is hospitalized. + +## Assumptions + +1. We describe the approach for sampling from an arbitrary infectiousness dsitribution for the uniform +distribution from zero to one -- $\mathcal{U}(0, 1)$. + - Using [inverse transform sampling](https://en.wikipedia.org/wiki/inverse_transform_sampling), a +uniformly-distributed random variable can be transformed to any distribution by passing samples through +the inverse cumulative distribution function (CDF). + - This requires that the infectiousness over time distribution has an inverse CDF or we are + able to approximate it (i.e., from empirical data). +2. Infectiousness over time is a relative quantity, so it is completely separate from the absolute degree +to which someone is infectious, represented by the total number of secondary infection attempts they have. +That quantity, denoted $C_i$ in the generality, is directly related to $R_0$. +3. We require $C_i$ ordered draws from the infectiousness over time/generation interval distribution (time +from an individual becoming infected to their secondary contacts becoming infected). We schedule infection +attempts at these drawn times. + +## Overall Methodology + +We could just take $C_i$ draws from the generation interval (GI) and order them, scheduling an infection +attempt at each time drawn. However, we argue (in detail [below](#why-do-we-need-order-statistics)) that +it is more robust to draw the minimum of our $C_i$ draws, evaluate an infection attempt at that time, +and _then_ schedule the next infection attempt -- drawing the second smallest of our $C_i$ ordered draws +from the GI. + +Drawing the smallest, the second smallest, the third smallest, etc. of a set number of random draws +from a distribution is the problem of having the distribution's +[order statistics](https://en.wikipedia.org/wiki/Order_statistic)", and it is readily solvable for the +case of the uniform distribution. + +## Drawing ordered samples from a uniform distribution + +Let us begin by considering the minimum of $n$ draws from a uniform distribution. We denote the sorted +first of these draws as $X_{(1)}$, the sorted second as $X_{(2)}$, etc. Note that this is different from $X_1$ +which is the unsorted first random draw from the distribution, unsorted. We are interested in the distribution of +$X_{(1)}$. Let us consider the cumulative distribution function of $X_{(1)}$ since it is a continuous random variable. + +$$\mathbb{P} \\{X_{(1)} \leq x \\}$$ + +We know that if the minimum is below some value, $x$, that could mean that just the minimum is below $x$ or that +all $n$ of our random samples are below the minimum. There are too many values to enumerate, so let us consider +the opposite instead -- that $X_{(1)}$ is greater than some value $x$. + +$$\mathbb{P} \\{X_{(1)} \leq x \\} = 1 - \mathbb{P} \\{X_{(1)} > x \\}$$ + +In this case, if $X_{(1)} > x$, we know that all of sampled values are at least above $x$. Recall that +each of the samples is independent and identically-distributed. + +$$= 1 - \mathbb{P} \\{X_i > x \\}^n$$ + +$$= 1 - (1 - F_X(x))^n$$ + +For a uniform distribution: + +$$= 1 - (1 - x)^n$$ + +This is the CDF for a Beta distribution with $alpha = 1$ and $beta = n$. More generally, the distribution +of the $k$th infection attempt from $n$ total infection attempts is $\beta(k, n - 1 + k)$. + +However, we cannot just independently draw the time of a next infection attempt from the corresponding Beta +distribution. Consider that we draw the first infection attempt time from $\beta(1, 5)$ and happen to get a large value. +We need to take that into account when drawing the second infection attempt time. The second infection +attempt time must be greater than the first, so we cannot just draw from $\beta(2, 6)$ but rather $\beta(2, 6) | x_{(1)}$. + +We can set $x_{(1)}$ as the minimum of a new uniform +distribution, $\mathcal{U}(x_{(1)}, 1)$, from which we need to draw an infection attempt. Because this is a new distribution, +we want the first of $n - 1$ infection attempt times on this distribution. We can do that by drawing the +minimum of $n - 1$ infection attempts from $\mathcal{U}(0, 1)$, and scaling that value to be on $(x_{(1)}, 1)$. +In other words, we shrink the available uniform distribution with each infection attempt and ask what +would be the _next_ infection attempt on that distribution. Concretely, if $x_{(1)}$ is +on $(0, 1)$, and $x_{(2)}$ is also on $(0, 1)$, we use the following equation: + +$$x_{(2)} := x_{(1)} + x_{(2)} * (1 - x_{(1)})$$ + +For further detail, the Wikipedia [page on order statistics](https://en.wikipedia.org/wiki/Order_statistic) is a +great reference as are various notes on the subject from +[advanced statistics courses](https://colorado.edu/amath/sites/default/files/attached-files/order_stats.pdf). + +## Workflow + +1. Draw a number of secondary infection attempts for the agent, $C_i$. This can be equal to $R_i$ if the sampling times +are from the generation interval exactly, or $C_i$ can be greater than $R_i$ if sampling from a +[proposal distribution](#rejection-sampling-from-an-arbitrary-generation-interval) and rejecting some attempts to leave +a total of $R_i$ infection attempts. +2. Draw the time for the first of $n$ remaining infection attempts of $\mathcal{U}(0, 1)$ by taking a random value from +$\beta(1, m)$. $m = C_i - $ (the number of infection attempts that have occured). +3. Scale the value on $\mathcal{U}(0, 1)$ to be on $\mathcal{U}(x_{(i)}, 1)$ where $x_{(i)}$ is the previous draw. +4. Convert the uniform value to generation interval space by passing it through the inverse CDF of the generation interval, +and schedule the next infection attempt at the specified time. Wait until that time has occured in the simulation before +proceeding. + + The result of passing the uniform time through the GI's inverse CDF is the time _since_ the agent first become +infectious at which the given $n$th infection attempt occurs. + +5. Repeat from step two until $m = 0$. + +## Why do we need order statistics? + +We provide a detailed justification of why scheduling the next infection attempt after the last (i.e., sequentially +drawing infection attempt times from the GI) is the better approach compared to taking all the draws +from the GI at the beginning of an individual's infectiousness period and at once scheduling the infection +attempts based on those times. + +### Changes in the number of infection attempts in the middle of an agent's infectious course + +Imagine an agent dies while they are still infectious. They can no longer infect others. (Or, if the +disease in question is Ebola, there are well-defined processes by which they may still be infectious and +those processes should not be lumped with the infection generation interval.) + +To ensure a dead person does not infect others, we would like to remove the plans where they infect others. +If we had pre-scheduled all infections, we would have had to store the plan IDs for all the infections in +`HashMap>`. Then, each time we had executed one of these plans, we would have to +remove it from the vector, so that the entry for each `PersonId` tells us the plans we have _left_ for a given +person. Then, when the agent dies, we could iterate through the remaining plans and cancel them. + +Clearly, this is cumbersome, requiring us to store a `HashMap` and iterate +through a vector every time an infection attempt occurs. Alternatively, if we scheduled infection events +sequentially, we would only have to store the next infection attempt time. We could instead use a +`HashMap`, removing the need to iterate through a vector and instead enabling us to +directly use the `.insert` method on the `HashMap`. + +More generally, this idea applies to any case where we need to change the number of infection attempts +partway through an infection course. Sequentially scheduling the attempts makes it possible to accomodate +changes to the number of infection attempts that may happen in the middle of an infection course. + +Why not just check whether the agent is alive or not at the beginning of the infection attempt? If they are +not alive, skip the infection attempt. There are two reasons this is not the desired solution. First, +it is cleaner to handle removing an agent at the point at which they become no longer relevant rather than +continuously checking on whether they are alive. Secondly, there are other cases when the number +of infection attempts may change partially through an infection, such as being hospitalized and having +a lower contact rate. Sequentially scheduling infection attempts enables arbitrary changes to the +effective contact rate while an agent is infectious in the simplest way. + +### Rejection sampling from an arbitrary generation interval + +Let us consider a case where the infectiousness distribution changes over time. Concretely, imagine that +infectiousness is a function of the viral load, and an agent gets an antiviral partially through their +infection course. Depending on when the agent gets the antiviral, they have a different reduction in viral +load. Therefore, even if we know that an agent will get an antiviral, we don't know their infectiousness +distribution in advance. + +It is still straightforward to sample the appropriate infectiousness distribution without needing to change +the generation interval. We can use [rejection sampling](https://en.wikipedia.org/wiki/Rejection_sampling). +In this case, we still sample infection attempt times from the pre-antiviral infectiousness distribution, and +we accept those sampled times as actual infection attempts with probability $a(t) / g(t)$ where $a(t)$ is the +post-antiviral infectiousness distribution and $g(t)$ is the underlying pre-antiviral infectiousness +distribution. + +However, rejection sampling is inherently inefficient. It requires that we draw plans at a faster rate than events +actually happen, and then we need to evaluate at the time of the plan whether we mean for something to actually +happen. Nevertheless, there are ways to improve efficiency. The trivial case of drawing plans at a faster rate +than events actually happen is to use the maximum event rate as the sampling rate for all $t$. But, if the +actual event rate is much smaller than the maximum for nearly all $t$, (i.e., a disease where infectiousness is highly +concentrated around a specific time or $a(t) << g(t)$), this is particularly inefficient because we are rejecting the majority of samples. Instead, we may try making our proposal distribution better fit our underlying distribution by making it, say, a linear approximation of $a(t)$. + +This approach is only possible if we sequentially sample infection attempts. If we sample all +infection attempts at the beginning of an agent's infection, we are locked into using a singular sampling rate. Thus, +sequential sampling enables us to most generally model a generation interval or infectiousness distribution +that may change over the course of an agent's infection in a non pre-defined way. + +These examples underscore that sequentially sampling infection attempts rather than pre-scheduling enables +the modeler to consider changes in both the contact rate and the generation interval without needing to change +the transmission model.