Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[augur tree] input alignment must end in .fasta #769

Closed
jameshadfield opened this issue Sep 20, 2021 · 7 comments
Closed

[augur tree] input alignment must end in .fasta #769

jameshadfield opened this issue Sep 20, 2021 · 7 comments
Assignees
Labels
bug Something isn't working easy problem Requires less work than most issues

Comments

@jameshadfield
Copy link
Member

Current Behavior

For IQ-TREE, the input alignment must end in .fasta as we replace this string with others to create intermediate files (code here). If this is not the case, the input alignment ends up being overwritten with the (failing) log files of IQ-TREE.

Expected behavior

The input alignment suffix should not matter - alignments are commonly named *.mfa or *.aln.

How to reproduce
Steps to reproduce the current behavior:

  1. Create a FASTA alignment which doesn't end in .fasta, e.g. nextalign.mfa
  2. augur tree --alignment nextalign.mfa --output tree.nwk
  3. The alignment will be overwritten with an IQ-TREE log file, and the command to run IQ-TREE printed to screen will be
Building a tree via:
        iqtree -ninit 2 -n 2 -me 0.05 -nt AUTO -s nextalign.mfa -m GTR  > nextalign.mfa

I have not tested for other tree building methods or VCF inputs. However looking at the code this seems to be limited to IQ-TREE with non-VCF inputs.

Possible solution

  • Validate the input filename and exit if it doesn't end in .fasta
  • Create temporary files needed for IQ-TREE without requiring the input name to end in .fasta (preferred solution)

Your environment: if running Nextstrain locally

  • augur 13.0.0 (current master)
@jameshadfield jameshadfield added the bug Something isn't working label Sep 20, 2021
@jameshadfield jameshadfield changed the title [augur tree] input alignment suffix must end in .fasta [augur tree] input alignment must end in .fasta Sep 20, 2021
@jameshadfield jameshadfield added the easy problem Requires less work than most issues label Sep 20, 2021
@genehack genehack self-assigned this Jan 21, 2025
@genehack
Copy link
Contributor

I can no longer reproduce this bug:

> augur tree --alignment ./align.mfa --output tree.nwk                                                                               
Building a tree via:
        iqtree2 --threads-max 1 -s align-delim.fasta -m GTR --ninit 2 -n 2 --epsilon 0.05 -T AUTO --redo > align-delim.iqtree.log
        Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
        Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300


Building original tree took 9.32535982131958 seconds

Running the command generates two log files, with identical content:

  • align-delim.fasta.log
  • align-delim.iqtree.log

I think the reason I'm not seeing the bug is because we rewrite the input alignment file into a temporary file that will always have a .fasta extension, as can be seen in the command output (-s align-delim.fasta) — which is also where the -delim bits in the log file names are coming from.

One thing I'm wondering about is whether we should add ".log" to the list of extensions to clean up, which would remove the redundant align-delim.fasta.log (but leave align-delim.iqtree.log).

Another enhancement would be to change how the log file name is built to use the original input alignment file name instead of the version renamed with the -delim.fasta extension. In this example, if we also added ".log" to the list of extensions getting cleaned up, the remaining log file would then be named align.iqtree.log (which would also be shown in the "Building a tree via:" line), which is arguably less confusing, I think?

@jameshadfield
Copy link
Member Author

The bug described in this issue was closed by 39299c7 (release 22.0.0). I'll think about the additional comments re: log file names and post another comment shortly.

@jameshadfield
Copy link
Member Author

Re: log files, I'm not sure I have a strong opinion; removing align-delim.fasta.log as part of clean-up is probably a good idea.

But looking at why -delim.* exists, I think this is entirely #1084 and would prefer we don't do this at all!

@genehack
Copy link
Contributor

I think this is entirely #1084 and would prefer we don't do this at all!

I'm not sure I understand what the antecedent of "this" in "prefer we don't do this" is — don't munge the strain names and let iqtree? Have augur tree throw an error (or a warning) if a name that will be munged is seen? Munge the strain names differently? (From reviewing #1084 it's not clear what the "fix" is there either.)

@jameshadfield
Copy link
Member Author

I'm not sure I understand what the antecedent of "this" in "prefer we don't do this" is

This: the -delim.* file(s). The reason we create them is to munge strain names. (Right? Is there something else they're for?) #1084 argues that this should happen upstream of augur tree and that augur tree should not change strain names, and error if it detects strain names which it believes IQ-TREE will change.

@jameshadfield
Copy link
Member Author

jameshadfield commented Jan 30, 2025

(I'd support closing this issue and moving discussion to #1084 )

@genehack
Copy link
Contributor

Taking the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working easy problem Requires less work than most issues
Projects
None yet
Development

No branches or pull requests

2 participants