-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix indel inference in ancestral reconstructions #131
Comments
Not totally agreed. I agree that it's an issue, but the problem is hard. I think that the easiest way forward is to get a result is to use gap coding, which perhaps might work well? Here is a paper which appears to do something related but more sophisticated. |
I think we should move to PRANK.
@metasoarous sorry, but I'm going to hand this off to you to try this out! |
Sorry!? Hah! Begone demon Phylip! I'm on it. |
Er, sorry, but PRANK isn't going to infer the trees for you... Now that I'm thinking about it, @cswarth had some funny experiences with the PRANK ancestral sequence reconstruction. Is that right, Chris? |
We used prank to infer ancestral sequences and trees for PREAST, https://github.com/matsengrp/PREAST/blob/master/bin/infer.sh#L157 I don't recall the specifics of how it came up with a tree. |
Thanks @cswarth @matsen Looks like it can infer and spit out its own guide trees via the In any case, if we supply our own trees, we can use PRANK just for the ancestral construction and for cleaning up the alignment (obviously we'd already have to have an alignment for producing the input tree), and this would free us up to choose something saner for the tree construction, yes? |
@matsen So how should we do this? If we want to use the ancestral sequences, we need to already have the final tree, so that the ancestral seqs correspond to that topology. But then what do we use for the alignment going into that tree? Do you think it's fine to just subset the big muscle alignment, and feed that into dnaml? Or is it worth taking those sequences and aligning them with a preliminary run of PRANK first, to get a better final tree? (And then do a second round of PRANK after to get the ancestral sequences?) |
We could do either strategy. IIUC the alignment problem isn't especially hard, right? The challenge here is to get ancestral sequences on the tree in the presence of indels. |
Ug... well, here's some sour apples. Prank complains with the like of |
👍 for continuing with ML. |
As discussed elsewhere, PRANK's ancestral state reconstruction appears to be a joke (harhar...). As @krdav and I thoroughly demonstrated to ourselves, the internal node sequences are all mismatched from the tree. @krdav has opened an issue for this here: ariloytynoja/prank-msa#16. For now, I'm going to put this issue on Ice, in case they fix things or we find another way around this issue. I will however take it off the MB release milestone. PS Thanks again for all your work on this @krdav! |
DNAML assumes that any insertions in the tip alignment started at the root. And since the naive is technically not the root but a tip, it's inferred from our lineage alignments (where we manually put the naive at the root) that an insertion happened shortly after the naive sequence (the root), and then disappeared right most of the other tips (all but the tip with the actual insertion). This threw @lauranoges for a loop, and is likely to do the same with others as well.
I think we can fix this pretty simply: If we look at each gap in the naive sequence, we can see which tip sequence don't share that gap (and there must be some, or they would have been filtered out at this point), and place gaps in all internal node sequences except those non-tip sequences decending from the mrca of the nodes with the insertions. This would be easy enough to code up and generally solve the problem. Thoughts @matsen?
The text was updated successfully, but these errors were encountered: