-
Notifications
You must be signed in to change notification settings - Fork 7
FAQ
TELL US ABOUT IT!!!
- Github issue
- Email Erin
- Send Erin a message on slack
Be sure to include the command used, what config file was used, and what the nextflow error was.
There is a template file with all the variables in this repo at configs/grandeur_template.config that the End User can copy and edit. All of the parameters are included in that file.
There's also a config file what we use here at UPHL, UPHL.config.
To get a copy of this config file (will not run workflow)
nextflow run UPHL-BioNGS/Grandeur --config_file true
To use the config file created by the End User, simply specify the path with -c
nextflow run UPHL-BioNGS/Grandeur -profile singularity -c <path to user edited config file>
There are three test profiles for "Grandeur", they download reads from the ENA.
-
test0
downloads six samples from the SRA to run through the workflow with default settings -
test1
uses those same samples, but does not download genomes from NCBI -
test2
downloads some CRPA and creates a multiple sequence alignment
nextflow run UPHL-BioNGS/Grandeur -profile test,singularity
Prior versions allowed more flexibility about which analyses were run. This was difficult to maintain.
There are some processes that can be turned off:
-
params.msa = false
is the default, but this skips multiple sequence alignment. -
params.current_datasets = false
is the default, but this will skip downloading genomes from NCBI for additional references in NCBI -
params.skip_extras = true
will skip the information subworkflow and anything outside of the core workflow
At UPHL, we use this workflow to determine the serotype of Salmonella and E. coli under CLIA. Therefore, all containers with their versions are explicitly selected if available, and any updates to this repo will come with a version change. In future endeavors, we hope to use this workflow for organism identification and AMR gene identification.
The CLIA officer of the End User may request additional locks be put in place, like having all of the containers specified. If additional help is needed, please submit an issue or Email me.
They perform well, their containers were easy to create, and @erinyoung had heard about them.
As "Grandeur" is intended to be a species agnostic workflow for a local public health laboratory, and sequencing is continuing to expand in its utility, new tools are constantly being needed to analyze isolates to further public health goals.
Many of these additional tools are added by need locally or from the End User, so if the End User knows of other serotyping/analysis tools, please submit an issue or tell @erinyoung about it, and we'll work in some options.
@erinyoung also appreciates pull requests from forks.
Warning : If there's not a reliable container of the suggested tool, @erinyoung will request that the End User create a container for that tool and contribute to StaPH-B's docker repositories.
Organisms with large genomes can still contribute to disease, but this is not the workflow for those. "Grandeur" uses spades for de novo alignment, and large genomes may be too much for spades.
As of the time of writing this README, reference-based alignment of SARS-CoV-2 is still the norm. "Grandeur" is for de novo assembly of things with small genomes. Cecret would be a better workflow for SARS-CoV-2 sequencing.
We wholeheartedly recommend TB-profiler for Mycobacterium tuberculosis isolates. Grandeur does include some tools, such as DrPRG and Mykrobe, but that is because they have proven useful to us for Nontuberculous mycobacteria (NTM).
genome_sizes.json has a list of commonly sequenced organisms and the approximate expected genome size for each organism. This is only used for estimating expected coverage. A file from the End User can be used instead and specified with params.genome_sizes
.
This workflow stands on the shoulders of giants. As such, please cite the individual tools that were useful for your manuscript so that those developers can continue to get funding. They are listed above. Mentioning this workflow in the text as "The Grandeur workflow v.VERSION (www.github.com/UPHL-BioNGS/Grandeur)" is good enough for @erinyoung's ego.
Yes. The main use-case at UPHL is to run "Grandeur" per sequencing run, which is variety of different organisms. Samples involved in outbreaks are generally spread over multiple runs.
The process at UPHL goes as follows:
- Run "Grandeur" on all the paired-end sequencing reads from a MiSeq run to get fasta files (located at
/grandeur/contigs
) with theuphl
profile - Gather the fasta files from their respective sequencing runs and put them in a new directory
- Add a representative genome from NCBI to this new directory
- Run "Grandeur" on the collected fasta files with the profile
just_msa
and specify the representative genome from NCBI as an outgroup
A real use case from UPHL with a Pseudomonas aeruginosa
nextflow run UPHL-BioNGS/Grandeur \
-with-tower \
-profile singularity,just_msa \
--iqtree2_outgroup GCF_000006765.1_ASM676v1_genomic \
--fastas fastas
No. Prior versions allowed this, but it proved too difficult to support.
E. coli and Shigella both belong to family Enterobacteriaceae and are closely related. Thus, it should be the expectation that mash and fastani will match E. coli samples with Shigella references or Shigella samples with E. coli references. We encourage users to place a high priority on the results from Shigatyper, which is a tool that specifically is used to identify Shigella species.
This happens when using nextflow pull
, which is the recommended use of this workflow.
Sometimes, however, there is an error:
$ nextflow pull UPHL-BioNGS/Grandeur
Checking UPHL-BioNGS/Grandeur ...
UPHL-BioNGS/Grandeur contains uncommitted changes -- cannot pull from repository
That's a nextflow error. The easiest way I've found to resolve this is to delete your local copy, which is likely at ~/.nextflow/assets/UPHL-BioNGS/Grandeur
.
The command is something like
rm -rf ~/.nextflow/assets/UPHL-BioNGS/Grandeur
-
- amrfinderplus
- bbduk
- blastn
- blobtools_*
- core_genome_evaluation
- circulocov
- datasets_*
- drprg
- elgato
- emmtyper
- fastani
- fastp
- fastqc
- heatcluster
- iqtree2
- kaptive
- kleborate
- kraken2
- mash_*
- mashtree
- mlst
- multiqc
- mykrobe
- panaroo
- pbptyper
- phytreeviz
- plasmidfinder
- prokka
- quast
- seqsero2
- serotypefinder
- shigatyper
- snp_dists
- spades