-
To facilitate reprocessing data, it would be helpful if the pairtools suite permitted filtering an existing .pairs file using the same filters available in the parse and parse2 commands, such as One way to do this might be to permit parse/parse2 to accept .pairs as input. Another way might be to split the filtering features into a separate command, and have the parse/parse2 commands bundle sam/bam -> pairs parsing with the separate filtering command. I have an alternative solution which would be to reverse-parse .pairs to a .sam format, then stream it into parse/parse2. That's pretty hackish, so please let me know if you're seriously considering implementing this feature. Thanks for making this suite of useful tools! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Can't they be implemented using pairtools select? https://pairtools.readthedocs.io/en/latest/cli_tools.html#pairtools-select |
Beta Was this translation helpful? Give feedback.
-
Well, I suppose information needed for --max-inter-align-gap is lost in the pairs format, so I guess it's not possible... |
Beta Was this translation helpful? Give feedback.
-
Hi Benjamin! Specifically, --max-insert-size and --max-molecule-size control how pairs are "rescued". "Rescue" is the merging of two inner-most alignments on the opposite sides of the read if they look like they come from a single DNA fragment. Depending of the value of these two flags, we may have either two alignments that will together form a pair, or, a single alignment that would form a pair with another adjacent alignment. These outcomes seem mutually exclusive to me, so at the moment I don't see how we could convert these options into columns and perform filtering. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick response! What you say makes sense. It sounds like it's best to think of parsing .sam/.bam to .pairs as more or less irreversible. |
Beta Was this translation helpful? Give feedback.
Hi Benjamin!
Thank you for the suggestion!
At the moment, I am not sure if implementing these options as a filter would be possible.
The issue is that these flags (-max-insert-size, --max-molecule-size, and --max-inter-align-gap) control how alignments are interpreted and parsed into pairs.
Specifically, --max-insert-size and --max-molecule-size control how pairs are "rescued". "Rescue" is the merging of two inner-most alignments on the opposite sides of the read if they look like they come from a single DNA fragment. Depending of the value of these two flags, we may have either two alignments that will together form a pair, or, a single alignment that would form a pair with another adjac…