Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capturing deletions #1

Open
zeeev opened this issue Nov 21, 2016 · 6 comments
Open

capturing deletions #1

zeeev opened this issue Nov 21, 2016 · 6 comments

Comments

@zeeev
Copy link

zeeev commented Nov 21, 2016

Greetings,

I've noticed that alignments break over small deletions. Is there a way to control the size of deletion an alignment can contain?

Thank you,

Zev

@ocxtal
Copy link
Owner

ocxtal commented Nov 22, 2016

Hi, Zev

In short, there is no way to control acceptable insertion / deletion size. Although the X-drop threshold might be used for the purpose setting very small value, it does not seem to be a good choice because it also splits alignments on low-identity regions.

Actually, the detailed answer depends on the length of the deletion.

  1. If it is shorter than 25 bases, the behavior might came from a bug in the alignment routine. I would appreciate if you could show the actual sequence pair that reproduce the case (and options provided to the program).

  2. Unfortunately, if it is longer than 25 bases, the behavior is a limitation of the alignment routine. The program uses a 32-cell fixed wide banded alignment with adaptive steering technique. The algorithm is confirmed by experiment that dropping indels longer than 25bases while capturing perfectly shorter than it. (The line BW = 32, the fourth line from the left, in the Figure 2(d) shows the trend: https://github.com/ocxtal/adaptivebandbench ) Since the reason of the algorithm selection is the good performance and efficiency of the adaptive band algorithm, i'm sorry but the limitation will not be alleviated in the future...😢

Thanks,

Hajime

@ocxtal
Copy link
Owner

ocxtal commented Nov 24, 2016

Hi, Zev

Minialign is now updated to version 0.3.2. In this release some bugs in the chaining routine, which made the chained path collapsed when it reached the head of the query sequence, are fixed. The chaining parameters, side lengths of the parallelogram window, are now modifiable with '-L' and '-H' flags (and the defaults are also enlarged to 5000, in order not to split chain around low-identity regions). I'm glad if you could test this new version.

Thank you.

Hajime

@zeeev
Copy link
Author

zeeev commented Nov 24, 2016

@ocxtal Sorry I didn't reply sooner. Thank you for the updates. I will re-run the alignments after the thanksgiving holiday. What parameters would you suggest for -L and -H to maximized INDEL/SV detection?

@ocxtal
Copy link
Owner

ocxtal commented Nov 27, 2016

Hi, Zev

Recommended -L and -H settings is difficult (since I'm not familiar with indel/SV calling...), hmm...

  • I confirmed by experiment that the default parameters -L5000 and -H5000 are large enough to stride low-identity regions in real PacBio and ONT data.
  • Smaller values affect on the contiguity of the resulting alignments but larger value does not. Nor on the other alignment qualities except for the calculation time.
  • Large indels (>= 25 bases) could not be captured in any resulting alignment due to the limitations of the alignment routine (as described in my first reply).

Currently I believe that large indel detection should be resolved in the postprocess of the local alignment and could be a preprocess of the SV detection program. However, if you say the large indels must be captured in the local alignment stage, I'll consider adding indel detection algorithm (alignment linking and gap filling) as a postprocess of the calculation of the alignment set.

Regards,

Hajime Suzuki

@ocxtal
Copy link
Owner

ocxtal commented Nov 28, 2016

Hi, Zev

Just now I have figured out that the problem is: the extension alignment terminated just before the indels and the following matching regions were not reported...! (I am sorry to be late to understand...😢) I have confirmed the phenomenon on my simulated data and I'll add downstream-rescuing algorithm in the next release.

Thanks,

Hajime

@ocxtal
Copy link
Owner

ocxtal commented Dec 6, 2016

Hi, Zev,

I'm sorry for my delayed reply. I've just pushed the minor update, 0.4.2, with a downstream alignment rescuing algorithm. The algorithm still fails collecting alignments after short indels, it performs much better than the previous release, 0.4.1. Please try it out.

Here are pileups of my test data.

minialign 041
minialign-0.4.1 (default params)

minialign 042
minialign-0.4.2 (default params)

bwamem
bwa-mem (default params), as a reference

Thanks,

Hajime Suzuki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants