-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut up root node (not just ultimate ancestor) #850
Comments
And here are the correlations between the known lengths of root nodes and what we infer (it's a pretty poor correlation, though!) rb = np.array(root_breaks)
mid_root_pos = rb[:-1] + np.diff(rb)/2
ss = np.searchsorted(rb, mid_root_pos)
plt.scatter(np.diff(root_breaks), rb[ss] - rb[ss-1])
rb = np.array(r2)
ss = np.searchsorted(rb, mid_root_pos)
plt.scatter(np.diff(root_breaks), rb[ss] - rb[ss-1], alpha=0.1)
print(
"corr coeff: known root lengths vs lengths with split ultimate:\n ",
np.corrcoef(np.diff(root_breaks), rb[ss] - rb[ss-1])[0, 1])
rb = np.array(r3)
ss = np.searchsorted(rb, mid_root_pos)
plt.scatter(np.diff(root_breaks), rb[ss] - rb[ss-1], alpha=0.1)
print(
"corr coeff: known root lengths vs lengths with extra split root:\n ",
np.corrcoef(np.diff(root_breaks), rb[ss] - rb[ss-1])[0, 1])
plt.xscale('log')
plt.yscale('log')
|
Extra splitting of the root certainly improves the n=10 plot from @a-ignatieva's ppreprint, especially when combined with @nspope's variational gamma method: |
@jeromekelleher and I decided this should be implemented at a minimum for |
A more justified model-based method to cutting up the root nodes is to implement the PSMC-on-the-tree idea for the root. If this is implemented, then it's possible that we should use that to cut up the root nodes instead. So there's an argument for making the version above only available as a non-default post-process option. |
We are (I would say) fully justified in cutting up the ultimate root, as we know that an ancestor with all-zeros is a simplification. Cutting up root nodes in general is more heuristic, so I suggest this should be part of the tsdate |
On the basis that the ultimate ancestor is not biologically very plausible, in recent version of tsinfer we now cut up edges that led direct to the ultimate ancestor, by running the new post_process routine.
However, I suspect (and tests show) that we still make root ancestors that are too long. Therefore we could think about cutting up not just the ultimate ancestor, but also any root in which the edges-in or the edges-out change.
Here's some example code, with a histogram of actual edge spans of the root node. Note that this code may result in nodes that are not ordered strictly by time.
The text was updated successfully, but these errors were encountered: