Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dada2 justConcatenate #129

Open
ARW-UBT opened this issue Feb 23, 2020 · 7 comments
Open

dada2 justConcatenate #129

ARW-UBT opened this issue Feb 23, 2020 · 7 comments

Comments

@ARW-UBT
Copy link

ARW-UBT commented Feb 23, 2020

Bug Description
I have recently installed an Illumina iSeq-100 benchtop sequencer in my lab, and it was announced allready in 2018, that there will be a 2x 250 bp sequencing cartrige available soon. Well, it is now 2020, and no new kit has appeared.

Since I can produce now 2x 150 bp reads only, the wonderful joining option in dada2 cannot be applied for 16S V3V4 regions. However, there is the justConcatenate option in dada2 standalone/R that could help here.

Questions
Actually, the justConcatenate option is not buit in in the q2 plugin. If I will concatenate outside the q2 workflow, the great provenance chain in q2 will be interrupted.
Question to the plugin developers: would it be possible to add the justConcatenate option to q2/dada2. Or dou you have any suggestion how to use the justConcatenate data for q2.

Comments
@benjjneb Thank you for directing me to this Forum and your comment that it might be possible with the existing dada2 plugin.

@benjjneb
Copy link
Collaborator

Q for the Q2 folks: Is there a concatenation option already implemented in one of the Q2 plugins that could be used prior to q2-dada2?

@ARW-UBT
Copy link
Author

ARW-UBT commented Mar 5, 2020 via email

@nbokulich
Copy link
Member

Sorry @benjjneb and @ARW-UBT your questions came mid-release so I think got lost in everyone's pile.

Is there a concatenation option already implemented in one of the Q2 plugins that could be used prior to q2-dada2?

No, not currently. Just concatenating causes issues all down the line, with phylogeny, taxonomic classification, etc. Nor have we received many (maybe 2?) requests for this feature. So I think exposing this option in q2-dada2 is probably a low priority, but I am curious what others think.

Actually, the justConcatenate option is not buit in in the q2 plugin. If I will concatenate outside the q2 workflow, the great provenance chain in q2 will be interrupted.

Here is a forum topic that describes a similar question, in which I've given steps for modifying your local branch of q2-dada2. This will allow you to expose or adjust options that are not available in the current release version, preserving the provenance chain.

@RobJamesRamos
Copy link

Just to add one more voice the the currently small chorus. We use just concatenate on an AMF LSU pipeline that uses our own downstream processing. It would greatly simplify our pipeline to have this flag exposed in qiime https://link.springer.com/article/10.1007/s00572-022-01068-3.

@nbokulich
Copy link
Member

nbokulich commented Jul 9, 2024

Hi @benjjneb ,
I am warming up to the idea of exposing justConcatenate in q2-dada2, as now I have had some time to think now about how we could handle concatenated ASVs in QIIME 2 to avoid issues with taxonomy classification etc. So I would like to pick up this conversation.

One thing that continues to trouble me is that justConcatenate will concatenate everything, including reads that do have overlap. This could mess up phylogeny, taxonomy, etc. This should be less of an issue with amplicons with low length heterogeneity (e.g, 16S), provided that users use it responsibly. However, it would be a common issue with length-variable regions like ITS — so this is one reason why I have been against this for many years, I am opposed to the "just" part in justConcatenate.

For this reason I think that it would be useful to expose an option to merge and then concatenate reads that fail to merge (because of lack of overlap; still rejecting reads that have partial overlap with mismatches in the overlap region). Reviewing various issues in the dada2 issue tracker I see that you are concerned about biases that could be introduced by having a mix of merged and concatenated reads, and I acknowledge this, but in some cases this may be less of a bias than, e.g., when users use merged ASVs in a hypervariable region and hence systematically lose longer amplicons. So it all boils down to users needing to exercise some responsibility in their analysis (which is already the case).

If we feel that such an output should have restricted uses downstream, one option would be to introduce a new type for concatenated (+merged) ASVs. This would limit the downstream analyses that users could perform, though this might be overly restrictive so we might consider this a last resort.

How would you feel about implementing a merge+concatenate option in q2-dada2? I see from benjjneb/dada2#279 that doing a merge + concatenate is simple; excluding reads with unacceptable mismatches and indels would take some more work, but maybe this is something that you have already worked on further?

I found this benchmark that looked at merging vs. concat vs. both vs. single-read only:
https://link.springer.com/article/10.1186/s12859-021-04410-2

it shows marginal improvement with "both", though it looks like this is done prior to passing to dada2 if I understand Fig 1 correctly.

@RobJamesRamos
Copy link

RobJamesRamos commented Jul 9, 2024

For what it's worth, the merge+concatenate would be a good compromise for our use case. It would be nice to have both options, including "justConcatenate", so that we can be sure that all reads are processed the same way, but I'm coming from an LSU mindset where reads are very unlikely to overlap. I totally understand the use case for using both merge and concatenate for more variable regions like the ITS. All in all, if only a merge+concatenate option was implemented I think our pipeline would switch to using it.

@cjfields
Copy link

cjfields commented Dec 3, 2024

@nbokulich we did implement a 'rescue' unmerged reads function in our Nextflow workflow a few years back, which was a drop-in variation on @benjjneb's mergePairs:

https://github.com/h3abionet/TADA/blob/1563758e96ecca23fb3c1b3b733db1cb88a41a84/templates/SeqTables.R#L6

It requires an extra flag to capture those reads back and concatenate them. We don't really deal with sequences that don't overlap well (too many mismatches), so this would need to be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants