-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describing our approach to modeling an arbitrary infectiousness distribution #19
Conversation
If we had pre-scheduled all infections, we would have had to store the plan IDs for all the infections in | ||
`HashMap<PersonId, Vec<PlanId>>`. Then, each time we had executed one of these plans, we would have to have | ||
removed it from the vector, so that the entry for each `PersonId` tells us the plans we have _left_ for a given | ||
person. Then, when the agent dies, we could iterate through the remaining plans and cancel them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this changes your basic argument, but this isn't the only implementation choice.
Instead, you can store the times of the next infection, rather than have them as plans, and then just plan the next one. This removes the need to cancel, and if you keep them in a sorted list, then you also don't need to iterate over to plan the next one.
As for the storage, that depends very much on the data structure. For instance, Person Properties are stored as a HashMap of Vecs of PersonProperty, so you actually need an item for each person, whether they have scheduled plans or not, so the only real additional cost is the Vec of times itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had not thought of this approach and appreciate this insight for now revising this section to drive home the main points concisely.
7b035d7
to
1e3d5ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! I think we can improve conciseness a bit more if we focus more on what's implemented in our model.
docs/time-varying-infectiousness.md
Outdated
distribution, $\mathcal{U}(x_{(1)}, 1)$, from which we need to draw an infection attempt. Because this is a new distribution, | ||
we want the first of $n - 1$ infection attempt times on this distribution. We can do that by drawing the | ||
minimum of $n - 1$ infection attempts from $\mathcal{U}(0, 1)$, and scaling that value to be on $(x_{(1)}, 1)$. | ||
In other words, we are using a trick where we shrink the available uniform distribution with each infection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would delete this "we are using a trick where", and just say what we are doing.
docs/time-varying-infectiousness.md
Outdated
This is the CDF for a Beta distribution with $alpha = 1$ and $beta = n$. More generally, the distribution | ||
of the $k$th infection attempt from $n$ total infection attempts is $\beta(k, n - 1 + k)$. | ||
|
||
However, we cannot just independently sample from these Beta distributions. Instead, we must update the distributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is a bit confusing. Could you try make it a bit more concise?
docs/time-varying-infectiousness.md
Outdated
|
||
The result of passing the uniform time through the GI's inverse CDF is the time _since_ the agent first become | ||
infectious at which the given $n$th infection attempt occurs. To determine the amount of time _elapsed_ until the next | ||
infection attempt, given that the agent is currently at their $n-1$th infection attempt, schedule the next infection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be $(n-1)$th for the rendering?
docs/time-varying-infectiousness.md
Outdated
|
||
### Changes in the number of infection attempts in the middle of an agent's infectious course | ||
|
||
Imagine an agent dies while they are still infectious. Clearly, they cannot be infecting others. (Or, if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly they cannot infect others.
part way through an infection course. Sequentially scheduling the attempts makes it possible to accomodate | ||
changes to the number of infection attempts that may happen in the middle of an infection course. | ||
|
||
Why not just check whether the agent is alive or not at the beginning of the infection attempt? If they are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think we could remove this entire paragraph for conciseness.
docs/time-varying-infectiousness.md
Outdated
distribution. Note that $a(t)$ and $g(t)$ must be on an absolute scale in this example and not scaled to have | ||
a unit integral. In the case where they are scaled, $g(t)$ can be rescaled to be $Mg(t)$ where $M = \max a(t)$. | ||
|
||
This general idea of rejection sampling is useful for other applications. Consider the case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what the purpose of this paragraph is. I think the previous paragraph drives the message home just fine.
docs/time-varying-infectiousness.md
Outdated
particularly inefficient because we are rejecting the majority of samples. Instead, we may try making our proposal | ||
distribution better fit our underlying distribution. We may make $s(t)$ a similar linear approximation for $g(t)$. | ||
|
||
However, this approximation is only possible if we sequentially sample infection attempts. If we sample all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "However," isn't necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR adds just a readme that describes an approach for modeling an arbitrary infectious period using order statistics. The focus of this document is explaining the need for this particular approach and a brief description of the math, not code.
Any and all feedback is welcome. I welcome particular feedback on the explanations of rejection sampling. Please also let me know any sections you think need to be more fleshed out.
This PR will not be merged in until all relevant parties have gotten the relevant time to review.
Cargo build/test will fail because the version of our code on main does not compile with the latest updates to ixa, in particular the need for
IxaError
indefine_global_properties!
with the new addition of a validator. However, there is a PR in place to fix that, and I can make a dummy commit to get the tests to rerun once that PR is merged in.Looking forward to hearing thoughts!