Skip to content

Commit

Permalink
update documentation of multiref input
Browse files Browse the repository at this point in the history
  • Loading branch information
scarlhoff committed May 17, 2024
1 parent 7916413 commit 09ab6aa
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 16 deletions.
2 changes: 1 addition & 1 deletion assets/schema_fasta.json
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@
"genotyping_reference_ploidy": {
"type": "integer",
"meta": ["genotyping_ploidy"],
"errorMessage": "Organism ploidy for GATK must be provided as integers."
"errorMessage": "Organism ploidy for GATK or FreeBayes must be provided as integers."
},
"genotyping_gatk_dbsnp": {
"type": "string",
Expand Down
40 changes: 25 additions & 15 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,14 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p

## Reference input

nf-core/eager supports two methods of supplying a reference FASTA via `--fasta`.
The first is a direct path to a single FASTA file with optional prebuilt indicies via `--fasta_fai`, `--fasta_dict`, etc., and the second is via a reference sheet.
nf-core/eager supports two methods of supplying reference FASTA files.
The first is a direct path to a single FASTA file with optional prebuilt indicies via `--fasta`, `--fasta_fai`, `--fasta_dict`, etc., and the second is via a reference sheet.

Providing a reference sheet to `--fasta` allows users to align their input reads to _multiple_ reference genomes in the same run. The reference sheet must be in the format of a comma- or tab-separated table, and the file extension must be `csv` or `tsv` respectively.
Providing a reference sheet to `--fasta_sheet` allows users to align their input reads to _multiple_ reference genomes in the same run. The reference sheet must be in the format of a comma- or tab-separated table, and the file extension must be `csv` or `tsv` respectively.

In addition to including the path to the FASTA, the reference sheet can also be used to specify paths to pre-built indices of each reference (namely, `.fai` from `samtools faidx`, `.dict` from `picard CreateSequenceDictionary`, and/or a directory pointing to a directory containing the indices for the given mapper - e.g. created with `bwa index`).

Note that passing a reference sheet to the pipeline with `--fasta` will _override_ any corresponding directly-supplied parameters specifying user-build indices (`--fasta_fai`, `--fasta_dict`).
Note that passing a reference sheet to the pipeline with `--fasta_sheet` will _override_ any corresponding directly-supplied parameters specifying user-build indices (`--fasta_fai`, `--fasta_dict`).

An example of a reference sheet in `csv` format is as follows:

Expand All @@ -83,17 +83,27 @@ Mammoth_MT_Krause,//<path>/<to>/data/Mammoth_MT_Krause.fasta,/<path>/<to>/data/M
Human_MT,/<path>/<to>/data/Human_MT.fasta.gz,,,,,
```

Only the `reference_name`, and `fasta` columns are mandatory, whereas all other cells can be empty depending on your context (i.e., the column name is still required, however you can supply paths to all or any of the optional columns).

| Header | Required | Description |
| -------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------- |
| reference_name | Yes | Name of the reference to be used in file names and nextflow console log |
| fasta | Yes | Path to FASTA of reference. Can be optionally gzipped |
| fai | No | Optional path to pre-build SAMtools `fai` index file corresponding to the FASTA |
| dict | No | Optional path to pre-build picard `dict` index file corresponding to the FASTA |
| mapper_index | No | Optional path to _directory_ containing pre-build mapper index files corresponding to the FASTA |
| circular_target | No | Optional string, and only required for CircularMapper, with name of entry in FASTA to extend (up to first space in header) |
| mitochondrion_header | No | Optional string, and only required for MTNucRatio, with name of entry in FASTA of mitochondrion entry's header. |
Only the `reference_name`, and `fasta` columns are mandatory, whereas all other cells can be empty depending on your context.

| Header | Required | Description |
| --------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------- |
| reference_name | Yes | Name of the reference to be used in file names and nextflow console log |
| fasta | Yes | Path to FASTA of reference. Can be optionally gzipped |
| fai | No | Optional path to pre-build SAMtools `fai` index file corresponding to the FASTA |
| dict | No | Optional path to pre-build picard `dict` index file corresponding to the FASTA |
| mapper_index | No | Optional path to _directory_ containing pre-build mapper index files corresponding to the FASTA |
| circular_target | No | Optional string, and only required for CircularMapper, with name of entry in FASTA to extend (up to first space in header) |
| mitochondrion_header | No | Optional string, and only required for MTNucRatio, with name of entry in FASTA of mitochondrion entry's header |
| snpcapture_bed | No | Optional path to BED file with SNP capture positions, only required for QualiMap |
| pileupcaller_bedfile | No | Optional path to BED file with SNP capture positions for genotyping with pileupCaller |
| pileupcaller_snpfile | No | Optional path to EIGENSTRAT SNP panel file for genotyping with pileupCaller |
| hapmap_file | No | Optional path to HapMap files for contamination estimation with ANGSD |
| pmdtools_masked_fasta | No | Optional path to masked FASTA files for PMDtools |
| pmdtools_bed_for_masking | No | Optional path to SNP capture BED file to mask the reference for PMDtools |
| sexdeterrmine_snp_bed | No | Optional path to SNP capture bed files for genetic sex estimation with SexDetERRmine |
| bedtools_feature_file | No | Optional path to feature file for coverage calculation with bedtools |
| genotyping_reference_ploidy | No | Optional integer to specify organism ploidy for genotyping with GATK or FreeBayes |
| genotyping_gatk_dbsnp | No | Optional path to SNP annotation file for genotyping with GATK |

Files for `fai`, `dict`, `mapper_index` will be generated by the pipeline for you if not specified.

Expand Down

0 comments on commit 09ab6aa

Please sign in to comment.