Skip to content
Young edited this page Feb 23, 2024 · 6 revisions

Frequently Asked Questions (aka FAQ)

What do I do if I encounter an error?

TELL US ABOUT IT!!!

Be sure to include the command used, what config file was used, and what the nextflow error was.

Where is an example config file?

There is a template file with all the variables in this repo at configs/grandeur_template.config that the End User can copy and edit. All of the parameters are included in that file.

There's also a config file what we use here at UPHL, UPHL.config.

To get a copy of this config file (will not run workflow)

nextflow run UPHL-BioNGS/Grandeur --config_file true

To use the config file created by the End User, simply specify the path with -c

nextflow run UPHL-BioNGS/Grandeur -profile singularity -c <path to user edited config file>

Do you have test data?

There are three test profiles for "Grandeur", they download reads from the ENA.

  • test0 downloads six samples from the SRA to run through the workflow with default settings
  • test1 uses those same samples, but does not download genomes from NCBI
  • test2 downloads some CRPA and creates a multiple sequence alignment
nextflow run UPHL-BioNGS/Grandeur -profile test,singularity

How do I turn processes off?

Prior versions allowed more flexibility about which analyses were run. This was difficult to maintain.

There are some processes that can be turned off:

  • params.msa = false is the default, but this skips multiple sequence alignment.
  • params.current_datasets = false is the default, but this will skip downloading genomes from NCBI for additional references in NCBI
  • params.skip_extras = true will skip the information subworkflow and anything outside of the core workflow

What about CLIA validation?

At UPHL, we use this workflow to determine the serotype of Salmonella and E. coli under CLIA. Therefore, all containers with their versions are explicitly selected if available, and any updates to this repo will come with a version change. In future endeavors, we hope to use this workflow for organism identification and AMR gene identification.

The CLIA officer of the End User may request additional locks be put in place, like having all of the containers specified. If additional help is needed, please submit an issue or Email me.

How were serotyping tools chosen for this workflow?

They perform well, their containers were easy to create, and @erinyoung had heard about them.

Are any other tools getting added to "Grandeur"?

As "Grandeur" is intended to be a species agnostic workflow for a local public health laboratory, and sequencing is continuing to expand in its utility, new tools are constantly being needed to analyze isolates to further public health goals.

Many of these additional tools are added by need locally or from the End User, so if the End User knows of other serotyping/analysis tools, please submit an issue or tell @erinyoung about it, and we'll work in some options.

@erinyoung also appreciates pull requests from forks.

Warning : If there's not a reliable container of the suggested tool, @erinyoung will request that the End User create a container for that tool and contribute to StaPH-B's docker repositories.

What about organisms with large genomes?

Organisms with large genomes can still contribute to disease, but this is not the workflow for those. "Grandeur" uses spades for de novo alignment, and large genomes may be too much for spades.

What about SARS-CoV-2?

As of the time of writing this README, reference-based alignment of SARS-CoV-2 is still the norm. "Grandeur" is for de novo assembly of things with small genomes. Cecret would be a better workflow for SARS-CoV-2 sequencing.

What about TB?

We wholeheartedly recommend TB-profiler for Mycobacterium tuberculosis isolates. Grandeur does include some tools, such as DrPRG and Mykrobe, but that is because they have proven useful to us for Nontuberculous mycobacteria (NTM).

What is genome_sizes.json used for?

genome_sizes.json has a list of commonly sequenced organisms and the approximate expected genome size for each organism. This is only used for estimating expected coverage. A file from the End User can be used instead and specified with params.genome_sizes.

How do I cite this workflow?

This workflow stands on the shoulders of giants. As such, please cite the individual tools that were useful for your manuscript so that those developers can continue to get funding. They are listed above. Mentioning this workflow in the text as "The Grandeur workflow v.VERSION (www.github.com/UPHL-BioNGS/Grandeur)" is good enough for @erinyoung's ego.

Can I re-use files?

Yes. The main use-case at UPHL is to run "Grandeur" per sequencing run, which is variety of different organisms. Samples involved in outbreaks are generally spread over multiple runs.

The process at UPHL goes as follows:

  1. Run "Grandeur" on all the paired-end sequencing reads from a MiSeq run to get fasta files (located at /grandeur/contigs) with the uphl profile
  2. Gather the fasta files from their respective sequencing runs and put them in a new directory
  3. Add a representative genome from NCBI to this new directory
  4. Run "Grandeur" on the collected fasta files with the profile just_msa and specify the representative genome from NCBI as an outgroup

A real use case from UPHL with a Pseudomonas aeruginosa

nextflow run UPHL-BioNGS/Grandeur \
  -with-tower \
  -profile singularity,just_msa \
  --iqtree2_outgroup GCF_000006765.1_ASM676v1_genomic \
  --fastas fastas

Can I start with prokka-annotated gff files?

No. Prior versions allowed this, but it proved too difficult to support.

Why do my E. coli samples have Shigella top hits and visa versa?

E. coli and Shigella both belong to family Enterobacteriaceae and are closely related. Thus, it should be the expectation that mash and fastani will match E. coli samples with Shigella references or Shigella samples with E. coli references. We encourage users to place a high priority on the results from Shigatyper, which is a tool that specifically is used to identify Shigella species.

How do I deal with "cannot pull from repository"?

This happens when using nextflow pull, which is the recommended use of this workflow.

Sometimes, however, there is an error:

$ nextflow pull UPHL-BioNGS/Grandeur
Checking UPHL-BioNGS/Grandeur ...
UPHL-BioNGS/Grandeur contains uncommitted changes -- cannot pull from repository

That's a nextflow error. The easiest way I've found to resolve this is to delete your local copy, which is likely at ~/.nextflow/assets/UPHL-BioNGS/Grandeur.

The command is something like

rm -rf ~/.nextflow/assets/UPHL-BioNGS/Grandeur
Clone this wiki locally