Skip to content

Commit

Permalink
Merge pull request #279 from replikation/fix-delete-small-fastq.gz
Browse files Browse the repository at this point in the history
skip removing small fastq.gz files after the length filter with --list or --sample
  • Loading branch information
DataSpott authored Jan 31, 2025
2 parents b3b8620 + 33e0dd6 commit cb6463f
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 11 deletions.
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,9 @@ Table of Contents
- [Version control](#version-control)
- [Important input flags (choose one)](#important-input-flags-choose-one)
- [Custom primer bed files](#custom-primer-bed-files)
- [Sample sheet](#sample-sheet)
- [Sample input](#sample-input)
- [Sample sheet](#sample-sheet)
- [List input](#list-input)
- [Pangolin Lineage definitions](#pangolin-lineage-definitions)
- [3. Quality Metrics (default)](#3-quality-metrics-default)
- [4. Workflow](#4-workflow)
Expand Down Expand Up @@ -167,7 +169,12 @@ MN908947.3 3144 3166 nCoV-2019_4_LEFT nCoV-2019_2 +
MN908947.3 4240 4262 nCoV-2019_4_RIGHT nCoV-2019_2 -
```

### Sample sheet
### Sample input

> [!NOTE]
> If using --fastq without either --sample or --list, samples whose concatenated and size-selected FastQ files are smaller than 1500 kB will be excluded from further analysis.
#### Sample sheet
* barcodes can be automatically renamed via `--samples sample_names.csv`
* required columns:
* `_id` = sample name
Expand All @@ -181,7 +188,22 @@ _id,Status,Description
Sample_2021,barcode01,good
2ndSample,BC02,bad
```


#### List input
* Using `--list` You can provide a csv as input to `--fastq` to select for specific fastq-files
* e.g.: `--fastq input.csv --list`
* the csv needs to contain two columns:
* column 1 = sample name
* column 2 = path to fastq-location
* no header should be used
* files get automatically renamed to the sample names provided in column 1

Example:
```csv
sample1,path/to/first/sample.fastq.gz
2ndSample,path/to/second/sample.fastq.gz
```

### Pangolin Lineage definitions
* lineage determinations are quickly changing in response to the pandemic
* to avoid using out of date lineage schemes, a `--update` flag can be added to each poreCov run to get the most recent version-controlled pangolin container
Expand Down
2 changes: 1 addition & 1 deletion modules/filter_fastq_by_length.nf
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ process filter_fastq_by_length {
;;
esac
if [ ${params.samples} == false ]; then
if [ ${params.samples} == false ] && [ ${params.list} == false ]; then
find . -name "${name}_filtered.fastq.gz" -type 'f' -size -1500k -delete
fi
"""
Expand Down
13 changes: 6 additions & 7 deletions poreCov.nf
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ if (!workflow.profile.contains('test_fastq') && !workflow.profile.contains('test
if ( params.fasta && ( params.fastq || params.fast5 || params.fastq_pass)) { exit 1, "Please use [--fasta] without inputs like: [--fastq], [--fastq_pass], [--fast5]" }
if (( params.fastq || params.fastq_pass ) && params.fast5 && !params.nanopolish ) {
exit 1, "Simultaneous fastq and fast5 input is only supported with [--nanopolish]"}
if (params.list && params.fasta) { exit 1, "[--fasta] and [--list] is not supported" }

}
if ( (params.cores.toInteger() > params.max_cores.toInteger()) && workflow.profile.contains('local')) {
Expand Down Expand Up @@ -156,16 +157,10 @@ if (params.samples) {
**************************/

// fasta input
if (!params.list && params.fasta && !workflow.profile.contains('test_fasta')) {
if (params.fasta && !workflow.profile.contains('test_fasta')) {
fasta_input_raw_ch = Channel
.fromPath( params.fasta, checkIfExists: true)
}
else if (params.list && params.fasta && !workflow.profile.contains('test_fasta')) {
fasta_input_raw_ch = Channel
.fromPath( params.fasta, checkIfExists: true )
.splitCsv()
.map { row -> file("${row[1]}", checkIfExists: true) }
}

// consensus qc reference input - auto using git default if not specified
if (params.reference_for_qc) {
Expand Down Expand Up @@ -508,6 +503,10 @@ ${c_yellow}Workflow control (optional)${c_reset}
Status,_id
barcode01,sample2011XY
BC02,thirdsample_run
--list --fastq takes a csv file containing (new) sample names and paths to the fastq-files (no header).
Paths need to start with '/' or poreCov searches the files in the current working dir, e.g.:
sample1,/path_to_first_sample.fastq.gz
sample2,/path_to_second_sample.fastq.gz
--extended poreCov utilizes from --samples these additional headers:
Submitting_Lab,Isolation_Date,Seq_Reason,Sample_Type
--nanopolish use nanopolish instead of medaka for ARTIC (needs --fast5)
Expand Down

0 comments on commit cb6463f

Please sign in to comment.