dada2 justConcatenate #129

ARW-UBT · 2020-02-23T09:03:30Z

Bug Description
I have recently installed an Illumina iSeq-100 benchtop sequencer in my lab, and it was announced allready in 2018, that there will be a 2x 250 bp sequencing cartrige available soon. Well, it is now 2020, and no new kit has appeared.

Since I can produce now 2x 150 bp reads only, the wonderful joining option in dada2 cannot be applied for 16S V3V4 regions. However, there is the justConcatenate option in dada2 standalone/R that could help here.

Questions
Actually, the justConcatenate option is not buit in in the q2 plugin. If I will concatenate outside the q2 workflow, the great provenance chain in q2 will be interrupted.
Question to the plugin developers: would it be possible to add the justConcatenate option to q2/dada2. Or dou you have any suggestion how to use the justConcatenate data for q2.

Comments
@benjjneb Thank you for directing me to this Forum and your comment that it might be possible with the existing dada2 plugin.

benjjneb · 2020-02-25T02:12:35Z

Q for the Q2 folks: Is there a concatenation option already implemented in one of the Q2 plugins that could be used prior to q2-dada2?

ARW-UBT · 2020-03-05T13:02:18Z

Hi Ben, did you receive any response to your post, I cannot see anything on github, but maybe, you were contacted by other channels? If not, do you see and chance to enable the ‘—just-concatenate’ option in the q2 plugin locally (e.g. in my own local installation)? Best regards Alfons Von: Benjamin Callahan [mailto:[email protected]] Gesendet: Dienstag, 25. Februar 2020 03:13 An: qiime2/q2-dada2 <[email protected]> Cc: Weig, Alfons <[email protected]>; Author <[email protected]> Betreff: Re: [qiime2/q2-dada2] dada2 justConcatenate (#129) Q for the Q2 folks: Is there a concatenation option already implemented in one of the Q2 plugins that could be used prior to q2-dada2? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#129?email_source=notifications&email_token=AD74K7R3YSYOCY64T53ATNTRER5BHA5CNFSM4KZY3PFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2IVHI#issuecomment-590645917>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD74K7QSUF5VSPAH22SCOALRER5BHANCNFSM4KZY3PFA>.

nbokulich · 2020-03-05T15:23:30Z

Sorry @benjjneb and @ARW-UBT your questions came mid-release so I think got lost in everyone's pile.

Is there a concatenation option already implemented in one of the Q2 plugins that could be used prior to q2-dada2?

No, not currently. Just concatenating causes issues all down the line, with phylogeny, taxonomic classification, etc. Nor have we received many (maybe 2?) requests for this feature. So I think exposing this option in q2-dada2 is probably a low priority, but I am curious what others think.

Actually, the justConcatenate option is not buit in in the q2 plugin. If I will concatenate outside the q2 workflow, the great provenance chain in q2 will be interrupted.

Here is a forum topic that describes a similar question, in which I've given steps for modifying your local branch of q2-dada2. This will allow you to expose or adjust options that are not available in the current release version, preserving the provenance chain.

RobJamesRamos · 2024-01-04T20:08:29Z

Just to add one more voice the the currently small chorus. We use just concatenate on an AMF LSU pipeline that uses our own downstream processing. It would greatly simplify our pipeline to have this flag exposed in qiime https://link.springer.com/article/10.1007/s00572-022-01068-3.

nbokulich · 2024-07-09T09:06:51Z

Hi @benjjneb ,
I am warming up to the idea of exposing justConcatenate in q2-dada2, as now I have had some time to think now about how we could handle concatenated ASVs in QIIME 2 to avoid issues with taxonomy classification etc. So I would like to pick up this conversation.

One thing that continues to trouble me is that justConcatenate will concatenate everything, including reads that do have overlap. This could mess up phylogeny, taxonomy, etc. This should be less of an issue with amplicons with low length heterogeneity (e.g, 16S), provided that users use it responsibly. However, it would be a common issue with length-variable regions like ITS — so this is one reason why I have been against this for many years, I am opposed to the "just" part in justConcatenate.

For this reason I think that it would be useful to expose an option to merge and then concatenate reads that fail to merge (because of lack of overlap; still rejecting reads that have partial overlap with mismatches in the overlap region). Reviewing various issues in the dada2 issue tracker I see that you are concerned about biases that could be introduced by having a mix of merged and concatenated reads, and I acknowledge this, but in some cases this may be less of a bias than, e.g., when users use merged ASVs in a hypervariable region and hence systematically lose longer amplicons. So it all boils down to users needing to exercise some responsibility in their analysis (which is already the case).

If we feel that such an output should have restricted uses downstream, one option would be to introduce a new type for concatenated (+merged) ASVs. This would limit the downstream analyses that users could perform, though this might be overly restrictive so we might consider this a last resort.

How would you feel about implementing a merge+concatenate option in q2-dada2? I see from benjjneb/dada2#279 that doing a merge + concatenate is simple; excluding reads with unacceptable mismatches and indels would take some more work, but maybe this is something that you have already worked on further?

I found this benchmark that looked at merging vs. concat vs. both vs. single-read only:
https://link.springer.com/article/10.1186/s12859-021-04410-2

it shows marginal improvement with "both", though it looks like this is done prior to passing to dada2 if I understand Fig 1 correctly.

RobJamesRamos · 2024-07-09T21:25:10Z

For what it's worth, the merge+concatenate would be a good compromise for our use case. It would be nice to have both options, including "justConcatenate", so that we can be sure that all reads are processed the same way, but I'm coming from an LSU mindset where reads are very unlikely to overlap. I totally understand the use case for using both merge and concatenate for more variable regions like the ITS. All in all, if only a merge+concatenate option was implemented I think our pipeline would switch to using it.

cjfields · 2024-12-03T00:55:25Z

@nbokulich we did implement a 'rescue' unmerged reads function in our Nextflow workflow a few years back, which was a drop-in variation on @benjjneb's mergePairs:

https://github.com/h3abionet/TADA/blob/1563758e96ecca23fb3c1b3b733db1cb88a41a84/templates/SeqTables.R#L6

It requires an extra flag to capture those reads back and concatenate them. We don't really deal with sequences that don't overlap well (too many mismatches), so this would need to be added.

benjjneb mentioned this issue Dec 18, 2020

Thoughts on concatenation benjjneb/dada2#1231

Closed

benjjneb mentioned this issue Sep 25, 2024

truncLenKeep benjjneb/dada2#2020

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dada2 justConcatenate #129

dada2 justConcatenate #129

ARW-UBT commented Feb 23, 2020 •

edited by Mestabrook3

Loading

benjjneb commented Feb 25, 2020

ARW-UBT commented Mar 5, 2020 via email

nbokulich commented Mar 5, 2020

RobJamesRamos commented Jan 4, 2024

nbokulich commented Jul 9, 2024 •

edited

Loading

RobJamesRamos commented Jul 9, 2024 •

edited

Loading

cjfields commented Dec 3, 2024

dada2 justConcatenate #129

dada2 justConcatenate #129

Comments

ARW-UBT commented Feb 23, 2020 • edited by Mestabrook3 Loading

benjjneb commented Feb 25, 2020

ARW-UBT commented Mar 5, 2020 via email

nbokulich commented Mar 5, 2020

RobJamesRamos commented Jan 4, 2024

nbokulich commented Jul 9, 2024 • edited Loading

RobJamesRamos commented Jul 9, 2024 • edited Loading

cjfields commented Dec 3, 2024

ARW-UBT commented Feb 23, 2020 •

edited by Mestabrook3

Loading

nbokulich commented Jul 9, 2024 •

edited

Loading

RobJamesRamos commented Jul 9, 2024 •

edited

Loading