-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does, or can, piscem
allow for mismatches between query and index?
#17
Comments
Hi @jeremymsimon, Piscem does pseudoalignment (optionally with structural constraints). This means that there must be at least 1 matching kmer in order to report a match. However, there is no restriction on the query as a whole —- so there can be mismatches, gaps, etc. However, that only applies if your query is of length > k. —Rob |
So - excuse the naive question - does that mean in practice that if my query is 25bp and my index is built with And relatedly, how does piscem handle a case if a given query aligns perfectly to one location but with 1 mismatch to a secondary location? |
Hi @jeremymsimon, No need for apologies!
It depends somewhat on the details of what happens with the non-matching k-mers and where the mismatch occurs (see my answer to your other question below). But in such a case, any of the present 23-mers would be sufficient to map the read to the reference.
Currently, only co-optimal mappings are reported. That is, if there is some set of targets P that account for the maximum number (say m) of matched k-mers, then any target Q having fewer than m matches will not be reported. Note in this case, since we're just talking about k-mer matches, even a single mismatch could cause many k-mers not to match (e.g. a mismatch in the middle of a target could cause there to be up to k mismatches). For example, if you have a query of length 50, and a mismatch right in the "middle" (not exactly the middle since 50 is even, but you get the idea), then it's possible that no 25-mer could actually match exactly. It's worth mentioning that the behavior of what to do with sub-optimal mappings is something that could be adjusted / modified. That is, if there's a need to report such things it would likely be possible to add such functionality to |
Thanks for that description, and I think this is the issue I'm facing with an already-short |
Hey @rob-p - Pretty much as the title suggests.
I'm doing some testing for a non-canonical application of
piscem
and am noticing that I only seemingly gettpm
andecount
> 0 frompiscem
->piscem-infer
when there is an exact match between query and index sequence.Is there a parallel to
pufferfish
's--minScoreFraction
here? Or is there some other parameter tuning that I'm missing such that mismatches are allowable?Thanks!
The text was updated successfully, but these errors were encountered: