Does, or can, `piscem` allow for mismatches between query and index? #17

jeremymsimon · 2024-01-05T21:04:00Z

Hey @rob-p - Pretty much as the title suggests.

I'm doing some testing for a non-canonical application of piscem and am noticing that I only seemingly get tpm and ecount > 0 from piscem -> piscem-infer when there is an exact match between query and index sequence.

Is there a parallel to pufferfish's --minScoreFraction here? Or is there some other parameter tuning that I'm missing such that mismatches are allowable?

Thanks!

The text was updated successfully, but these errors were encountered:

rob-p · 2024-01-06T03:50:02Z

Hi @jeremymsimon,

Piscem does pseudoalignment (optionally with structural constraints). This means that there must be at least 1 matching kmer in order to report a match. However, there is no restriction on the query as a whole —- so there can be mismatches, gaps, etc. However, that only applies if your query is of length > k.

—Rob

jeremymsimon · 2024-01-08T14:30:34Z

So - excuse the naive question - does that mean in practice that if my query is 25bp and my index is built with k=23, I could in effect have 0, 1, or 2 mismatches?

And relatedly, how does piscem handle a case if a given query aligns perfectly to one location but with 1 mismatch to a secondary location?

rob-p · 2024-01-09T16:58:14Z

Hi @jeremymsimon,

No need for apologies!

does that mean in practice that if my query is 25bp and my index is built with k=23, I could in effect have 0, 1, or 2 mismatches?

It depends somewhat on the details of what happens with the non-matching k-mers and where the mismatch occurs (see my answer to your other question below). But in such a case, any of the present 23-mers would be sufficient to map the read to the reference.

And relatedly, how does piscem handle a case if a given query aligns perfectly to one location but with 1 mismatch to a secondary location?

Currently, only co-optimal mappings are reported. That is, if there is some set of targets P that account for the maximum number (say m) of matched k-mers, then any target Q having fewer than m matches will not be reported. Note in this case, since we're just talking about k-mer matches, even a single mismatch could cause many k-mers not to match (e.g. a mismatch in the middle of a target could cause there to be up to k mismatches). For example, if you have a query of length 50, and a mismatch right in the "middle" (not exactly the middle since 50 is even, but you get the idea), then it's possible that no 25-mer could actually match exactly.

It's worth mentioning that the behavior of what to do with sub-optimal mappings is something that could be adjusted / modified. That is, if there's a need to report such things it would likely be possible to add such functionality to piscem. However, as a k-mer based method, the constraint that at least a single k-mer must match the query is pretty fundamental.

jeremymsimon · 2024-01-09T17:37:31Z

Thanks for that description, and I think this is the issue I'm facing with an already-short k in my case (k < 20) given my short queries. If I want to be tolerant of a potential mismatch in the middle of the query, then in actuality, it seems I would need an exceedingly small k=9 or shorter, and for up to 3 mismatches that could occur anywhere, it may simply not be feasible. Am I understanding that all correctly? If so this may just represent a fundamental limitation of a k-mer approach for the type of non-canonical application I'm looking at currently-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does, or can, `piscem` allow for mismatches between query and index? #17

Does, or can, `piscem` allow for mismatches between query and index? #17

jeremymsimon commented Jan 5, 2024

rob-p commented Jan 6, 2024

jeremymsimon commented Jan 8, 2024

rob-p commented Jan 9, 2024

jeremymsimon commented Jan 9, 2024

Does, or can, piscem allow for mismatches between query and index? #17

Does, or can, piscem allow for mismatches between query and index? #17

Comments

jeremymsimon commented Jan 5, 2024

rob-p commented Jan 6, 2024

jeremymsimon commented Jan 8, 2024

rob-p commented Jan 9, 2024

jeremymsimon commented Jan 9, 2024

Does, or can, `piscem` allow for mismatches between query and index? #17

Does, or can, `piscem` allow for mismatches between query and index? #17