Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes before the upcoming release #257

Merged
merged 21 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
76971c3
added maxquant profile to nextflow.config, clarified some docu, made …
WackerO Mar 27, 2024
d1f8d3b
prettier
WackerO Mar 27, 2024
6520455
Bugfix in proteus_measurecol_prefix parameter with final whitespace
WackerO Mar 27, 2024
10e365e
prettier
WackerO Mar 27, 2024
a72c18b
Readded the projectDir to hopefully get rid of the download error
WackerO Mar 27, 2024
fbb214a
linting
WackerO Mar 27, 2024
c6a8353
more linting; fails locally despite same paths as in airrflow, but ma…
WackerO Mar 27, 2024
158cc1c
Added file params with defaults to linting ignore list as they are not
WackerO Mar 27, 2024
242cf96
checked out dev logos as the nf-core lint --fix files_unchanged comma…
WackerO Mar 27, 2024
088b0f9
prettier
WackerO Mar 27, 2024
b25f1fc
Update docs/usage.md
WackerO Apr 8, 2024
3d0f1c2
Update docs/usage.md
WackerO Apr 8, 2024
04ae35d
Update docs/usage.md
WackerO Apr 8, 2024
74d7acb
updated proteus module
WackerO Apr 12, 2024
c024383
Merge branch 'prerelease_fixes' of https://github.com/WackerO/differe…
WackerO Apr 12, 2024
cab1c01
Fixed filter_difftable module not properly filtering (had to switch t…
WackerO Apr 19, 2024
04eac80
Merge branch 'dev' of https://github.com/nf-core/differentialabundanc…
WackerO Apr 19, 2024
f1afb24
Updated changelog
WackerO Apr 19, 2024
dac77c9
linting
WackerO Apr 22, 2024
8750d89
moved log2 from workflow into filter module
WackerO Apr 22, 2024
d9fd29a
Merge branch 'dev' of https://github.com/nf-core/differentialabundanc…
WackerO Apr 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
repository_type: pipeline
lint:
nextflow_config:
- config_defaults:
- params.logo_file
- params.css_file
- params.citations_file
- params.report_file
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Added`

- [[#259](https://github.com/nf-core/differentialabundance/pull/259)] - Bump gtf2featureannotation to fix GTF handling error ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO))
- [[#257](https://github.com/nf-core/differentialabundance/pull/257)] - Added maxquant profile to nextflow.config to make it available ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#254](https://github.com/nf-core/differentialabundance/pull/254)] - Some parameter changes, added qbic credits ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#250](https://github.com/nf-core/differentialabundance/pull/250)] - Template update for nf-core/tools v2.13.1 ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#244](https://github.com/nf-core/differentialabundance/pull/244)] - Add pipeline params for matrixfilter NA options ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
Expand All @@ -19,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Fixed`

- [[#257](https://github.com/nf-core/differentialabundance/pull/257)] - Fixed FILTER_DIFFTABLE module, updated PROTEUS module to better handle whitespace in prefix param, made docs clearer ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#254](https://github.com/nf-core/differentialabundance/pull/254)] - Made differential_file_suffix optional ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#240](https://github.com/nf-core/differentialabundance/pull/240)] - Publish GSEA reports ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO))
- [[#231](https://github.com/nf-core/differentialabundance/pull/231)] - Update GSEA module to fix butterfly plot bug ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
Expand Down
11 changes: 6 additions & 5 deletions assets/differentialabundance_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -267,8 +267,9 @@ for (r in seq_along(contributors)) {

```{r, echo=FALSE}
observations <- read_metadata(file.path(params$input_dir, params$observations), id_col = params$observations_id_col)
if (! params$observations_name_col %in% colnames(observations)){
stop(paste('Invalid observation name column specified: ', params$observations_name_col, paste0('(Valid values are: ', paste(colnames(observations), collapse=', '),')')))
observations_name_col <- ifelse(!is.null(params$observations_name_col), params$observations_name_col, params$observations_id_col)
if (! observations_name_col %in% colnames(observations)){
stop(paste('Invalid observation name column specified: ', observations_name_col, paste0('(Valid values are: ', paste(colnames(observations), collapse=', '),')')))
}

if (! is.null(params$features)){
Expand Down Expand Up @@ -305,7 +306,7 @@ assay_data <- lapply(assay_files, function(x) {
row.names = 1
)
)
colnames(mat) <- observations[[params$observations_name_col]][match(colnames(mat), rownames(observations))]
colnames(mat) <- observations[[observations_name_col]][match(colnames(mat), rownames(observations))]
mat
})

Expand All @@ -316,7 +317,7 @@ if (!is.null(params$features_log2_assays)) {
assay_data <- cond_log2_transform_assays(assay_data, params$features_log2_assays)

# Now we can rename the observations rows using the title field
rownames(observations) <- observations[[params$observations_name_col]]
rownames(observations) <- observations[[observations_name_col]]

# Run PCA early so we can understand how important each variable is
pca_datas <- lapply(names(assay_data), function(assay_type){
Expand Down Expand Up @@ -547,7 +548,7 @@ Whiskers in the above boxplots show `r params$exploratory_whisker_distance` time
plotly_densityplot(
assay_data,
experiment = observations,
colorby = params$observations_name_col,
colorby = observations_name_col,
expressiontype = paste("count per", params$features_type),
makeColorScale(length(unique(observations[[params$observations_id_col]])), palette = "Set1")
)
Expand Down
2 changes: 1 addition & 1 deletion conf/maxquant.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ params {
differential_feature_name_column = "Majority protein IDs"

// Proteus options
proteus_measurecol_prefix = 'LFQ intensity '
proteus_measurecol_prefix = 'LFQ intensity'

// Shiny does not work for this datatype
shinyngs_build_app = false
Expand Down
2 changes: 1 addition & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ process {
"--contrast_variable \"${meta.id}\"",
"--sample_id_col \"${params.observations_id_col}\"",
"--protein_id_col \"${params.features_id_col}\"",
"--measure_col_prefix \"${params.proteus_measurecol_prefix}\"",
"--measure_col_prefix \"${params.proteus_measurecol_prefix}\"".replaceAll(~/_s\b/, ' '),
"--norm_function $params.proteus_norm_function",
"--plotsd_method $params.proteus_plotsd_method",
"--plotmv_loess $params.proteus_plotmv_loess",
Expand Down
8 changes: 6 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ With the above in mind, running this workflow requires:
--input '[path to samplesheet file]'
```

This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare.
This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare. The columns that the pipeline requires are:

- a column listing the sample IDs (must be the same IDs as in the abundance matrix), in the example below it is called 'sample'. For some study_types, this column might need to be filled in with file names, e.g. when doing an affymetrix analysis.
- one or more columns describing conditions for the differential analysis. In the example below it is called 'condition'
- optionally one or more columns describing sample batches or similar which you want to be considered in the analysis. In the example below it is called 'batch'

For example:

Expand Down Expand Up @@ -96,7 +100,7 @@ So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.t
--matrix '[path to matrix file]'
```

This is the proteinGroups.txt file produced by MaxQuant. It is a tab-separated matrix file with a column for every observation (plus additional columns for other types of measurements and information); each row contains these data for a set of proteins. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs. The parameter `--proteus_measurecol_prefix` defines which prefix is used to extract those matrix columns which contain the measurements to be used. For example, the default `LFQ intensity ` will indicate that columns like LFQ intensity S1, LFQ intensity S2, LFQ intensity S3 etc. are used (do not forget trailing whitespace in this parameter, if required!).
This is the proteinGroups.txt file produced by MaxQuant. It is a tab-separated matrix file with a column for every observation (plus additional columns for other types of measurements and information); each row contains these data for a set of proteins. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs. The parameter `--proteus_measurecol_prefix` defines which prefix is used to extract those matrix columns which contain the measurements to be used. For example, the default `LFQ intensity ` will indicate that columns like LFQ intensity S1, LFQ intensity S2, LFQ intensity S3 etc. are used (one whitespace is automatically added if necessary).

### Affymetrix microarrays

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
},
"proteus/readproteingroups": {
"branch": "master",
"git_sha": "516189e968feb4ebdd9921806988b4c12b4ac2dc",
"git_sha": "a069b29783583c219c1f23ed3dcf64a5aee1340b",
"installed_by": ["modules"]
},
"rmarkdownnotebook": {
Expand Down
50 changes: 20 additions & 30 deletions modules/local/filter_difftable.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ process FILTER_DIFFTABLE {

label 'process_single'

conda "conda-forge::gawk=5.1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-writing this process in Python is a quite a major change to smuggle into a PR of small fixes ;-).

Can you say why the process was re-written, and demonstrate that the output is the same relative to previous?

Copy link
Collaborator Author

@WackerO WackerO Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer, I can also do the module change in a separate PR, that's not a problem!

Nope, can't demonstrate that, the output changed; it was incorrect before, that's why I had to change it :') AWK suddenly filtered out all the genes and just produced an empty table (was not the case when I originally added the module).

I tried a lot to change it in AWK but it simply did not work and I have to be honest, I found the AWK code quite painful to deal with as I'm simply not experienced enough with it.

Copy link
Collaborator Author

@WackerO WackerO Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but as a follow-up, I did check the output. The new module + this code makes it so that the results are the same as the DE genes according to the html report, see here:

filtered_results.zip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could you raise this as a separate PR? I think that would make more sense and you can post the reasoning and the evidence there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, see #264

conda "pandas=1.5.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/gawk:5.1.0' :
'biocontainers/gawk:5.1.0' }"
'https://depot.galaxyproject.org/singularity/pandas:1.5.2' :
'biocontainers/pandas:1.5.2' }"

input:
tuple val(meta), path(input_file)
Expand All @@ -20,42 +20,32 @@ process FILTER_DIFFTABLE {
task.ext.when == null || task.ext.when

script:
def VERSION = '9.1' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
"""
output_file=\$(echo $input_file | sed 's/\\(.*\\)\\..*/\\1/')_filtered.tsv
#!/usr/bin/env python

# Function to find column number
find_column_number() {
awk -v column="\$2" '{for(i=1;i<=NF;i++) if (\$i == column) {print i; exit}}' <<< "\$(head -n 1 "\$1")"
}
from os import path
import pandas as pd
import platform
from sys import exit

# Extract column numbers
logFC_col=\$(find_column_number "$input_file" "log2FoldChange")
padj_col=\$(find_column_number "$input_file" "padj")

# Prepare the output file
head -n 1 "$input_file" > "\${output_file}.tmp"

# The following snippet performs the following checks on each row (add +0.0 to the numbers so that they are definitely treated as numerics):
#
# 1. Check that the current logFC/padj is not NA
# 2. Check that the current logFC is >= threshold (abs does not work, so use a workaround)
# 3. Check that the current padj is <= threshold
#
# If this is true, the row is written to the new file, otherwise not
if not any("$input_file".endswith(ext) for ext in [".csv", ".tsv", ".txt"]):
exit("Please provide a .csv, .tsv or .txt file!")

awk -F'\\t' -v logFC_col="\$logFC_col" -v padj_col="\$padj_col" -v logFC_thresh="$logFC_threshold" -v padj_thresh="$padj_threshold" '
NR > 1 && \$logFC_col != "NA" && \$padj_col != "NA" &&
((\$logFC_col+0.0 >= logFC_thresh+0.0) || (-\$logFC_col+0.0 >= logFC_thresh+0.0)) &&
\$padj_col+0.0 <= padj_thresh+0.0 { print }
' "$input_file" >> "\${output_file}.tmp"
table = pd.read_csv("$input_file", sep=("," if "$input_file".endswith(".csv") else "\t"), header=0)
table = table[~table["$logFC_column"].isna() &
~table["$padj_column"].isna() &
(pd.to_numeric(table["$logFC_column"], errors='coerce').abs() >= float("$logFC_threshold")) &
(pd.to_numeric(table["$padj_column"], errors='coerce') <= float("$padj_threshold"))]

# Rename temporary file to final output file
mv "\${output_file}.tmp" "\$output_file"
table.to_csv(path.splitext(path.basename("$input_file"))[0]+"_filtered.tsv", sep="\t", index=False)

cat <<-END_VERSIONS > versions.yml
"${task.process}":
bash: \$(echo \$(bash --version | grep -Eo 'version [[:alnum:].]+' | sed 's/version //'))
END_VERSIONS
with open('versions.yml', 'a') as version_file:
version_file.write('"${task.process}":' + "\\n")
version_file.write(" python: " + (platform.python_version()) + "\\n")
version_file.write(" pandas: " + str(pd.__version__) + "\\n")
"""
}
2 changes: 2 additions & 0 deletions modules/nf-core/proteus/readproteingroups/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/proteus/readproteingroups/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/nf-core/proteus/readproteingroups/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ params {
sizefactors_from_controls = false

// Reporting
logo_file = "docs/images/nf-core-differentialabundance_logo_light.png"
css_file = "assets/nf-core_style.css"
citations_file = "CITATIONS.md"
report_file = "assets/differentialabundance_report.Rmd"
logo_file = "$projectDir/docs/images/nf-core-differentialabundance_logo_light.png"
css_file = "$projectDir/assets/nf-core_style.css"
citations_file = "$projectDir/CITATIONS.md"
report_file = "$projectDir/assets/differentialabundance_report.Rmd"
report_title = null
report_author = null
report_contributors = null
Expand Down Expand Up @@ -63,7 +63,7 @@ params {
affy_build_annotation = true

// Proteus-specific options
proteus_measurecol_prefix = 'LFQ intensity '
proteus_measurecol_prefix = 'LFQ intensity'
proteus_norm_function = 'normalizeMedian'
proteus_plotsd_method = 'violin'
proteus_plotmv_loess = true
Expand Down Expand Up @@ -342,6 +342,7 @@ profiles {
test_nogtf { includeConfig 'conf/test_nogtf.config' }
test_full { includeConfig 'conf/test_full.config' }
affy { includeConfig 'conf/affy.config' }
maxquant { includeConfig 'conf/maxquant.config' }
rnaseq { includeConfig 'conf/rnaseq.config' }
soft {includeConfig 'conf/soft.config'}
test_affy { includeConfig 'conf/test_affy.config' }
Expand Down
Loading
Loading