You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The JSON produced by augur translate contains the following data (which are AA translations):
<JSON>.reference.<gene>
<JSON>.node.<rootNodeName>.aa_sequences.<gene> (not currently used for VCF inputs, but I have a WIP commit which adds this)
When using a JSON input to --ancestral-sequences (1) is simply the AA sequence from the root node - i.e. (1) == (2). It is not the "reference" AA sequence, because the command has no knowledge of what the reference sequence is¹.
When using VCF inputs, a corresponding nucleotide FASTA reference input is required, and (1) is a translation of the gene's region from this sequence.
There are two salient points:
The "reference" key is a misnomer when JSON inputs are used. (For VCF inputs this is fine.) If there are mutations in ~all nodes relative to the reference then (1) will be different depending on the choice of JSON vs VCF inputs.
Mutations are not inferred on the root node when JSON inputs are used, because there's nothing to compare the root node to². (For VCF files mutations are inferred against the translated reference.)
Expected behavior
VCF input or FASTA/JSON input to ancestral/translate should not result in changes in inference or mutation annotations on the tree. Ideally the outputs would also be the same, but for file size reasons this is not always possible.
How to reproduce
TODO: create a test to demonstrate this. Unfortunately the "simple-genome" tests I've recently added in PRs don't have a AA mutation shared across all sequences, which is what's needed here.
Possible solution
Allow JSON inputs to have a corresponding (nuc) reference sequence, and use this for the "reference" translations. (This is what VCF inputs do.) In this case we can also infer mutations on the root node. We could add an extra argument (mirroring VCF input) or use the reference.nuc key in the input JSON¹.
Remove the "reference" key when using JSON inputs without a provided reference sequence. This will be problematic for augur export v2 as it uses this to export root-sequences. (The names here get very confusing very fast.)
Your environment: if running Nextstrain locally
augur 23.1.1
Footnotes
¹ Ok this isn't quite true. The JSON (produced by augur ancestral) will have a json.refererence.nuc sequence, but augur translate never reads it. Depending on how augur ancestral was run, this may be a reference sequence or the inferred sequence at the tree root.
² I think certain invocations of augur ancestral will produce nuc mutations on the root node. I don't know what augur translate will do in this case.
The text was updated successfully, but these errors were encountered:
Current Behaviour
The JSON produced by
augur translate
contains the following data (which are AA translations):<JSON>.reference.<gene>
<JSON>.node.<rootNodeName>.aa_sequences.<gene>
(not currently used for VCF inputs, but I have a WIP commit which adds this)When using a JSON input to
--ancestral-sequences
(1) is simply the AA sequence from the root node - i.e.(1) == (2)
. It is not the "reference" AA sequence, because the command has no knowledge of what the reference sequence is¹.When using VCF inputs, a corresponding nucleotide FASTA reference input is required, and (1) is a translation of the gene's region from this sequence.
There are two salient points:
Expected behavior
VCF input or FASTA/JSON input to ancestral/translate should not result in changes in inference or mutation annotations on the tree. Ideally the outputs would also be the same, but for file size reasons this is not always possible.
How to reproduce
TODO: create a test to demonstrate this. Unfortunately the "simple-genome" tests I've recently added in PRs don't have a AA mutation shared across all sequences, which is what's needed here.
Possible solution
reference.nuc
key in the input JSON¹.augur export v2
as it uses this to export root-sequences. (The names here get very confusing very fast.)Your environment: if running Nextstrain locally
augur 23.1.1
Footnotes
¹ Ok this isn't quite true. The JSON (produced by
augur ancestral
) will have ajson.refererence.nuc
sequence, butaugur translate
never reads it. Depending on howaugur ancestral
was run, this may be a reference sequence or the inferred sequence at the tree root.² I think certain invocations of
augur ancestral
will produce nuc mutations on the root node. I don't know whataugur translate
will do in this case.The text was updated successfully, but these errors were encountered: