Skip to content

Commit

Permalink
Allow cat_cat to retain double file extensions for .gz files (nf-core…
Browse files Browse the repository at this point in the history
…#4230)

* Retain double file extensions for .gz files

- Adds logic to capture file extensions that
preceed `.gz`, e.g. `.fasta.gz` and use this
double extension as the suffix for the output
file.
- Updates tests accordingly.

* Re-add cat/cat entry to pytest_modules.yml

* Simplify code to extract double .gz file extensions

Instead of a conditional check using a regex capture group
and a nested lastIndexOf() extraction for double extensions `.xxxxx.gz`,
this update searches for a regex match and directly adds it as a suffix
if there is a match, and uses the existing lastIndexOf() method
to capture a single file extension otherwise.

Co-authored-by: Matthias Hörtenhuber <[email protected]>

* [automated] Fix linting with Prettier

* Moved the function outside of the process block

* bugfix: missing closing bracket

* Reordered the lines to make the diff shorter

* Changed the tests to regular snapshots so that we can see and check the output file name

---------

Co-authored-by: Matthias Hörtenhuber <[email protected]>
Co-authored-by: nf-core-bot <[email protected]>
Co-authored-by: Matthieu Muffato <[email protected]>
  • Loading branch information
4 people authored and lrauschning committed Jan 17, 2024
1 parent 9f3e922 commit 6818119
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 39 deletions.
11 changes: 10 additions & 1 deletion modules/nf-core/cat/cat/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ process CAT_CAT {
def args2 = task.ext.args2 ?: ''
def file_list = files_in.collect { it.toString() }

// choose appropriate concatenation tool depending on input and output format

// | input | output | command1 | command2 |
// |-----------|------------|----------|----------|
// | gzipped | gzipped | cat | |
Expand All @@ -30,7 +32,7 @@ process CAT_CAT {
// | ungzipped | gzipped | cat | pigz |

// Use input file ending as default
prefix = task.ext.prefix ?: "${meta.id}${file_list[0].substring(file_list[0].lastIndexOf('.'))}"
prefix = task.ext.prefix ?: "${meta.id}${getFileSuffix(file_list[0])}"
out_zip = prefix.endsWith('.gz')
in_zip = file_list[0].endsWith('.gz')
command1 = (in_zip && !out_zip) ? 'zcat' : 'cat'
Expand Down Expand Up @@ -68,3 +70,10 @@ process CAT_CAT {
END_VERSIONS
"""
}

// for .gz files also include the second to last extension if it is present. E.g., .fasta.gz
def getFileSuffix(filename) {
def match = filename =~ /^.*?((\.\w{1,5})?(\.\w{1,5}\.gz$))/
return match ? match[0][1] : filename.substring(filename.lastIndexOf('.'))
}

6 changes: 2 additions & 4 deletions modules/nf-core/cat/cat/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,7 @@ nextflow_process {
def lines = path(process.out.file_out.get(0).get(1)).linesGzip
assertAll(
{ assert process.success },
{ assert snapshot(lines[0..5]).match("test_cat_zipped_zipped_lines") },
{ assert snapshot(lines.size()).match("test_cat_zipped_zipped_size")}
{ assert snapshot(process.out).match() }
)
}
}
Expand Down Expand Up @@ -142,8 +141,7 @@ nextflow_process {
def lines = path(process.out.file_out.get(0).get(1)).linesGzip
assertAll(
{ assert process.success },
{ assert snapshot(lines[0..5]).match("test_cat_unzipped_zipped_lines") },
{ assert snapshot(lines.size()).match("test_cat_unzipped_zipped_size")}
{ assert snapshot(process.out).match() }
)
}
}
Expand Down
92 changes: 58 additions & 34 deletions modules/nf-core/cat/cat/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 6818119

Please sign in to comment.