Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More large sequence collection problems #13

Open
pwlabamherst opened this issue Jul 6, 2016 · 0 comments
Open

More large sequence collection problems #13

pwlabamherst opened this issue Jul 6, 2016 · 0 comments

Comments

@pwlabamherst
Copy link

We're trying to use Prank to align large numbers (1000's) of sequences. Our current target is a dataset of about 2000 sequences; using the 2015 Mac version and the -uselogs switch, we have managed to successfully complete a 5 iteration alignment of about 1000 sequences, and managed 2-4 iterations with somewhat larger subsets. These ended in miditeration with hirschberg initialization: impossible bwd state (-1) errors, followed by a stackdump.

The question is whether we can create alignments of subsets of the sequences (perhaps using parallel cores in a cluster) and then use merge to reach alignments of larger numbers of sequences.

To a first attempt to do that, we generated trees with Prank -treeonly -d=filename.fas -o=filename.dnd for each of two halves of the 2000 sequence collection, and then attempted to merge with prank -d1=filename1.fas -d2=filename2.fas -t1=filename1.dnd -t2=filename2.dnd. That resulted in the following error message:

Correcting (arbitrarily) for multifurcating nodes.
Correcting (arbitrarily) for multifurcating nodes.
Correcting (arbitrarily) for multifurcating nodes.
Names in sequence file  and guide tree  do not match!

Thinking that the problem might have arisen from duplicate sequences, the sequence files were pruned using -prunedata, but attempting to merge the resulting sequence files resulted in the same error.

Questions:

How can the names in the sequence file and guide tree be made to match?
Is it possible to align large sequence collections that cannot be aligned directly by merging smaller pieces?
Is the strategy of merging smaller pieces of the alignment collection likely to be faster than trying to align the entire collection at once (although the latter approach does not seem to be possible)?
Is it possible to merge more than two subalignments in a single command, and if so, is that likely to be faster than aligning pairwise?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant