Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-artic Primer Set #216

Closed
whottel opened this issue Aug 21, 2023 · 7 comments
Closed

Non-artic Primer Set #216

whottel opened this issue Aug 21, 2023 · 7 comments

Comments

@whottel
Copy link

whottel commented Aug 21, 2023

Hello,

I am trying to run Cecret on some older fastqs that were generated with a modified version of the Midnight primer set.
I am currently using this command to run Cecret v.3.7.20230725 in this case:
/Shared/SHL-BUG/software/nextflow/nextflow run UPHL-BioNGS/Cecret -profile singularity --samtools_amplicon_stats_options '--max-amplicons 2000' --single_reads . --primer_bed ./CL_Modified_Midnight_Primerersion220901.scheme.bed --aci false --freyja false --freyja_aggregate false --samtools_plot_ampliconstats false -c /Shared/SHL-BUG/singularity/Cecret/Cecret_custom.config

After attempts to modify the primer bed file to more closely resemble the formatting of the atric primer bed files provided by Cecret, I consistently get an error message stating that there were 0 primers found in the bed file.
I am thinking that the primer bed file I am trying to use is in the wrong format in some way, but am unsure what the exact issue is.
Please find attached the primer bed file converted to txt to upload to github and the generated error file.
CL_Modified_Midnight_Primers_SARS-CoV-2_version220901.primer.bed.txt
Error.txt

Thanks,
Wes

@erinyoung
Copy link
Member

Oh no! I'm at a conference for the first part of this week, so it may be a few days before I can really look at this. At initial glance, though, it looks like some of your primers are duplicates (they have the same start and end). Does it work if you remove those?

@whottel
Copy link
Author

whottel commented Aug 22, 2023

Hi Erin,
Thanks for the response. I noticed I had a typo in the name of the bed file in the original command example. I corrected that and updated the bed file following your suggestion, but now I am seeing this error:

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  [amplicon] error: bad bed file format in line 1 of CL_Modified_Midnight_Primers_SARS-CoV-2_version220901_edit.primer.bed.
  (N.B. ref/chrom name limited to 1023 characters.)
  samtools ampliconstats: Could not read file "CL_Modified_Midnight_Primers_SARS-CoV-2_version220901_edit.primer.bed"

@whottel
Copy link
Author

whottel commented Aug 22, 2023

Hi Erin,
I was able to edit the primer bed file to work. However, when the pipeline completed I noticed that over 90% of reads were removed during the ivar_trim step. Example:

Trimmed primers from 13.37% (21397) of reads.
93.75% (150067) of reads were quality trimmed below the minimum length of 74 bp and were not written to file.
5.55% (8889) of reads started outside of primer regions. Since the -e flag was given, these reads were written to file.
100% (160066) of reads had their insert size smaller than their read length

I should have mentioned earlier that I am using single reads generated from ONT. Can Cecret be used to analyze ONT reads? If so do some quality cutoffs need to be adjusted to accommodate the higher error rate for this platform?

@erinyoung
Copy link
Member

I was curious as to what happens if ONT files are put into the workflow, and now I know.

Can Cecret be used to analyze ONT reads?

No. I've had the goal to add ONT for awhile (#147), but I don't have sample files to test processes on.

Do you have some files that you could share with me?

@whottel
Copy link
Author

whottel commented Aug 23, 2023

Hi Erin,

I was able to get a more reasonable result by skipping the primer trim step. Turns our the fastqs generated by the platform we were using are not actually "raw", but trimmed for barcodes and primers and aligned to the SC2 reference sequence.
That said still seems like may be too much of stretch to use Ceret for these at this time.
As for example data, we have submitted to SRA a few thousand of sequences generated by the ClearLabs DX instrument, which is using ONT. Attached is a list of accession IDs from a random run.
ONT_SRR_IDs.txt

@erinyoung
Copy link
Member

You can use the fasta files generated from workflows as input into Cecret if that is something that would be helpful to you (runs through Pangolin and vadr) (more information can be found at https://github.com/UPHL-BioNGS/Cecret#using-a-sample-sheet)

@erinyoung
Copy link
Member

Forgive me for taking so long to address this. I have a PR that's almost ready to get released (#221) which might address this issue. This will use artic's pipeline for nanopore reads.

Nanopore reads will be able to be read in via a directory

nextflow run UPHL-BioNGS/Cecret --nanopore <directory with nanopore reads>

Or in a sample sheet

sample,fastq_1,fastq_2
example,example.fastq.gz,nanopore

And then the sample sheet is read in with the sample_sheet param.

nextflow run UPHL-BioNGS/Cecret --sample_sheet samplesheet.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants