More large sequence collection problems #13

pwlabamherst · 2016-07-06T22:22:25Z

We're trying to use Prank to align large numbers (1000's) of sequences. Our current target is a dataset of about 2000 sequences; using the 2015 Mac version and the -uselogs switch, we have managed to successfully complete a 5 iteration alignment of about 1000 sequences, and managed 2-4 iterations with somewhat larger subsets. These ended in miditeration with hirschberg initialization: impossible bwd state (-1) errors, followed by a stackdump.

The question is whether we can create alignments of subsets of the sequences (perhaps using parallel cores in a cluster) and then use merge to reach alignments of larger numbers of sequences.

To a first attempt to do that, we generated trees with Prank -treeonly -d=filename.fas -o=filename.dnd for each of two halves of the 2000 sequence collection, and then attempted to merge with prank -d1=filename1.fas -d2=filename2.fas -t1=filename1.dnd -t2=filename2.dnd. That resulted in the following error message:

Correcting (arbitrarily) for multifurcating nodes.
Correcting (arbitrarily) for multifurcating nodes.
Correcting (arbitrarily) for multifurcating nodes.
Names in sequence file  and guide tree  do not match!

Thinking that the problem might have arisen from duplicate sequences, the sequence files were pruned using -prunedata, but attempting to merge the resulting sequence files resulted in the same error.

Questions:

How can the names in the sequence file and guide tree be made to match?
Is it possible to align large sequence collections that cannot be aligned directly by merging smaller pieces?
Is the strategy of merging smaller pieces of the alignment collection likely to be faster than trying to align the entire collection at once (although the latter approach does not seem to be possible)?
Is it possible to merge more than two subalignments in a single command, and if so, is that likely to be faster than aligning pairwise?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More large sequence collection problems #13

More large sequence collection problems #13

pwlabamherst commented Jul 6, 2016

More large sequence collection problems #13

More large sequence collection problems #13

Comments

pwlabamherst commented Jul 6, 2016