Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describing our approach to modeling an arbitrary infectiousness distribution #19

Merged
merged 13 commits into from
Dec 10, 2024

Conversation

ChiragKumar9
Copy link
Collaborator

This PR adds just a readme that describes an approach for modeling an arbitrary infectious period using order statistics. The focus of this document is explaining the need for this particular approach and a brief description of the math, not code.

Any and all feedback is welcome. I welcome particular feedback on the explanations of rejection sampling. Please also let me know any sections you think need to be more fleshed out.

This PR will not be merged in until all relevant parties have gotten the relevant time to review.

Cargo build/test will fail because the version of our code on main does not compile with the latest updates to ixa, in particular the need for IxaError in define_global_properties! with the new addition of a validator. However, there is a PR in place to fix that, and I can make a dummy commit to get the tests to rerun once that PR is merged in.

Looking forward to hearing thoughts!

@ChiragKumar9 ChiragKumar9 changed the title Describing our approach to modeling an arbitrary infectious period Describing our approach to modeling an arbitrary infectiousness distribution Nov 29, 2024
If we had pre-scheduled all infections, we would have had to store the plan IDs for all the infections in
`HashMap<PersonId, Vec<PlanId>>`. Then, each time we had executed one of these plans, we would have to have
removed it from the vector, so that the entry for each `PersonId` tells us the plans we have _left_ for a given
person. Then, when the agent dies, we could iterate through the remaining plans and cancel them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this changes your basic argument, but this isn't the only implementation choice.

Instead, you can store the times of the next infection, rather than have them as plans, and then just plan the next one. This removes the need to cancel, and if you keep them in a sorted list, then you also don't need to iterate over to plan the next one.

As for the storage, that depends very much on the data structure. For instance, Person Properties are stored as a HashMap of Vecs of PersonProperty, so you actually need an item for each person, whether they have scheduled plans or not, so the only real additional cost is the Vec of times itself

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had not thought of this approach and appreciate this insight for now revising this section to drive home the main points concisely.

@ChiragKumar9 ChiragKumar9 force-pushed the ckk_docs_arbitrary_infectiousness branch from 7b035d7 to 1e3d5ea Compare December 4, 2024 17:10
Copy link
Collaborator

@confunguido confunguido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! I think we can improve conciseness a bit more if we focus more on what's implemented in our model.

distribution, $\mathcal{U}(x_{(1)}, 1)$, from which we need to draw an infection attempt. Because this is a new distribution,
we want the first of $n - 1$ infection attempt times on this distribution. We can do that by drawing the
minimum of $n - 1$ infection attempts from $\mathcal{U}(0, 1)$, and scaling that value to be on $(x_{(1)}, 1)$.
In other words, we are using a trick where we shrink the available uniform distribution with each infection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would delete this "we are using a trick where", and just say what we are doing.

This is the CDF for a Beta distribution with $alpha = 1$ and $beta = n$. More generally, the distribution
of the $k$th infection attempt from $n$ total infection attempts is $\beta(k, n - 1 + k)$.

However, we cannot just independently sample from these Beta distributions. Instead, we must update the distributions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is a bit confusing. Could you try make it a bit more concise?


The result of passing the uniform time through the GI's inverse CDF is the time _since_ the agent first become
infectious at which the given $n$th infection attempt occurs. To determine the amount of time _elapsed_ until the next
infection attempt, given that the agent is currently at their $n-1$th infection attempt, schedule the next infection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be $(n-1)$th for the rendering?


### Changes in the number of infection attempts in the middle of an agent's infectious course

Imagine an agent dies while they are still infectious. Clearly, they cannot be infecting others. (Or, if the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly they cannot infect others.

part way through an infection course. Sequentially scheduling the attempts makes it possible to accomodate
changes to the number of infection attempts that may happen in the middle of an infection course.

Why not just check whether the agent is alive or not at the beginning of the infection attempt? If they are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think we could remove this entire paragraph for conciseness.

distribution. Note that $a(t)$ and $g(t)$ must be on an absolute scale in this example and not scaled to have
a unit integral. In the case where they are scaled, $g(t)$ can be rescaled to be $Mg(t)$ where $M = \max a(t)$.

This general idea of rejection sampling is useful for other applications. Consider the case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what the purpose of this paragraph is. I think the previous paragraph drives the message home just fine.

particularly inefficient because we are rejecting the majority of samples. Instead, we may try making our proposal
distribution better fit our underlying distribution. We may make $s(t)$ a similar linear approximation for $g(t)$.

However, this approximation is only possible if we sequentially sample infection attempts. If we sample all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "However," isn't necessary here.

Copy link
Collaborator

@confunguido confunguido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ChiragKumar9 ChiragKumar9 merged commit b804259 into main Dec 10, 2024
3 checks passed
@ChiragKumar9 ChiragKumar9 deleted the ckk_docs_arbitrary_infectiousness branch December 10, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants