[Do not merge!] Pseudo PR for first release #8

mashehu · 2024-08-05T06:46:39Z

Do not merge! This is a PR of dev compared to first release for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged into first-commit-for-pseudo-pr!

Co-authored-by: Fabian Lehmann <[email protected]> Co-authored-by: David Frantz <[email protected]>

…core labels

(e.g. nf-core github actions)

Important! Template update for nf-core/tools v2.14.1

github-actions · 2024-08-05T06:48:00Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit bfabfee

+| ✅ 203 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   5 tests had warnings |!

❗ Test warnings:

nextflow_config - Config manifest.version should end in dev: 1.0.0
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in ro-crate-metadata.json: "description": "
\n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-rangeland_logo_dark.png">\n <img alt="nf-core/rangeland" src="docs/images/nf-core-rangeland_logo_light.png">\n \n
\n \n\n\n\n\n\n\n\n\n \n\n## Introduction\n\nnf-core/rangeland is a geographical best-practice analysis pipeline for remotely sensed imagery.\nThe pipeline processes satellite imagery alongside auxiliary data in multiple steps to arrive at a set of trend files related to land-cover changes. The main pipeline steps are:\n\n1. Read satellite imagery, digital elevation model (dem), endmember definition, water vapor database (wvdb), datacube definition and area of interest definition (aoi)\n2. Generate allow list and analysis mask to determine which pixels from the satellite data can be used\n3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information (aka. level 2 analysis read data)\n4. Merge spatially and temporally overlapping preprocessed data\n5. Classify pixels by applying linear spectral unmixing\n6. Time series analyses to obtain trends in vegetation dynamics to derive level 3 data\n7. Create mosaic and pyramid visualizations of the results\n8. Version reporting with MultiQC (MultiQC)\n\n<p align="center">\n <img title="nf-core/rangeland diagram" src="docs/images/rangeland_diagram.png" width=95%>\n
\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test before running the workflow on actual data.\n\nTo run, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required as input data.\nPlease refer to the usage documentation for details on the input structure.\n\nNow, you can run the pipeline using:\n\nbash\nnextflow run nf-core/rangeland \\\n -profile <docker/singularity/.../institute> \\\n --input <SATELLITE IMAGES> \\\n --dem <DIGITAL ELEVATION MODEL> \\\n --wvdb <WATER VAPOR DATA> \\\n --data_cube <DATA CUBE> \\\n --aoi <AREA OF INTEREST> \\\n --endmember <ENDMEMBER SPECIFICATION> \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nThe rangeland workflow was originally written by:\n\n- Fabian Lehmann\n- David Frantz\n\nThe original workflow can be found on github.\n\nTransformation to nf-core/rangeland was conducted by Felix Kummer.\nnf-core alignment started on the nf-core branch of the original repository.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Fabian Lehmann\n- Katarzyna Ewa Lewinska.\n\n## Acknowledgements\n\nThis pipeline was developed and aligned with nf-core as part of the Foundations of Workflows for Large-Scale Scientific Data Analysis (FONDA) initiative.\n\n\n\nFONDA can be cited as follows:\n\n> The Collaborative Research Center FONDA.\n>\n> Ulf Leser, Marcus Hilbrich, Claudia Draxl, Peter Eisert, Lars Grunske, Patrick Hostert, Dagmar Kainm\u00fcller, Odej Kao, Birte Kehr, Timo Kehrer, Christoph Koch, Volker Markl, Henning Meyerhenke, Tilmann Rabl, Alexander Reinefeld, Knut Reinert, Kerstin Ritter, Bj\u00f6rn Scheuermann, Florian Schintke, Nicole Schweikardt, Matthias Weidlich.\n>\n> Datenbank Spektrum 2021 doi: 10.1007/s13222-021-00397-5\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #rangeland channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/rangeland for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n\nThis pipeline is based one the publication listed below.\nThe publication can be cited as follows:\n\n> FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters\n>\n> Lehmann, F., Frantz, D., Becker, S., Leser, U., Hostert, P. (2021). FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters. In CIKM Workshops.\n",
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config
files_exist - File is ignored: conf/igenomes_ignored.config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rangeland_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rangeland_logo_light.png
files_exist - File found: docs/images/nf-core-rangeland_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: ro-crate-metadata.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-rangeland_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRangeland.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.sensors_level2= LND04 LND05 LND07
nextflow_config - Config default value correct: params.start_date= 1984-01-01
nextflow_config - Config default value correct: params.end_date= 2006-12-31
nextflow_config - Config default value correct: params.resolution= 30
nextflow_config - Config default value correct: params.indexes= NDVI BLUE GREEN RED NIR SWIR1 SWIR2
nextflow_config - Config default value correct: params.mosaic_visualization= true
nextflow_config - Config default value correct: params.pyramid_visualization= true
nextflow_config - Config default value correct: params.group_size= 100
nextflow_config - Config default value correct: params.publish_dir_enabled= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rangeland_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rangeland_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rangeland_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'image/tiff, application/x-tar'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - FORCE_GENERATE_ANALYSIS_MASK found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_GENERATE_TILE_ALLOW_LIST found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_PREPROCESS found in conf/modules.config and Nextflow scripts.
modules_config - HIGHER_LEVEL_CONFIG found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_HIGHER_LEVEL found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_PYRAMID found in conf/modules.config and Nextflow scripts.
modules_config - FORCE_MOSAIC found in conf/modules.config and Nextflow scripts.
modules_config - CHECK_RESULTS found in conf/modules.config and Nextflow scripts.
modules_config - CHECK_RESULTS_FULL found in conf/modules.config and Nextflow scripts.
modules_config - PREPROCESS_CONFIG found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR_ found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.1.1

Run details

nf-core/tools version 3.1.1
Run at 2025-01-03 12:33:24

mashehu

Very nice work! Almost there 🤏🏻

I think an option to resolve symlinks and copy the actual files would be good for reproducibility
I have a feeling the modules could rely more on the strengths of nextflow, e.g. many have for loops over files, these should be seperate nextflow jobs imo.

mashehu · 2024-08-05T06:50:02Z

README.md

 ```bash
-nextflow run nf-core/rangeland \
+nextflow run nf-core/rangeland/main.nf \


Suggested change

nextflow run nf-core/rangeland/main.nf \

nextflow run nf-core/rangeland \

mashehu · 2024-08-05T06:52:13Z

bin/merge_boa.r

+
+args = commandArgs(trailingOnly=TRUE)
+
+
+if (length(args) < 3) {
+    stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
+}
+
+fout <- args[1]
+finp <- args[2:length(args)]
+nf <- length(finp)
+
+require(raster)


Suggested change

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

require(raster)

require(raster)

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

at least in genomics it is standard the load the libraries at the beginning of an R script.

mashehu · 2024-08-05T06:53:32Z

bin/merge_boa.r

+for (i in 1:nf){
+
+    data <- brick(finp[i])[]
+
+    num <- num + !is.na(data)
+
+    data[is.na(data)] <- 0
+    sum <- sum + data
+
+}


how large is nf here usually? for larger nf try to use apply instead of a for-loop to improve the performance

This highly depends on the type, size and overlap of the pipeline's input data. It may become >100 for some extreme cases, but for our currently used data its usually between 5 and 20. The merge scripts are mostly untouched from the previous (not nf-core) installation of this pipeline. I will rework them and also include the other changes you suggested.

mashehu · 2024-08-05T06:53:58Z

bin/merge_qai.r

+
+args = commandArgs(trailingOnly=TRUE)
+
+
+if (length(args) < 3) {
+    stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
+}
+
+fout <- args[1]
+finp <- args[2:length(args)]
+nf <- length(finp)
+
+require(raster)


Suggested change

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

require(raster)

require(raster)

args = commandArgs(trailingOnly=TRUE)

if (length(args) < 3) {

stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)

}

fout <- args[1]

finp <- args[2:length(args)]

nf <- length(finp)

bin/merge_boa.r

mashehu · 2024-08-05T08:37:05Z

docs/usage.md

+--resolution '[integer]'
+```
+
+The default value is 30, as most Landsat satellite natively provide this resolution.


Suggested change

The default value is 30, as most Landsat satellite natively provide this resolution.

The default value is `30`, as most Landsat satellite natively provide this resolution.

mashehu · 2024-08-05T08:38:33Z

docs/usage.md

+--end_date   '[YYYY-MM-DD]'
+```
+
+Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.


Suggested change

Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.

we show the default values on the parameters page, so easier to keep the docs in sync, by only having them in one place (if no further explanation of the choice of default values is given)

mashehu · 2024-08-05T08:38:54Z

docs/usage.md

+
+### Group size
+
+The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.


Suggested change

The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.

The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to `100`.

mashehu · 2024-08-05T08:40:55Z

docs/usage.md

+
+### Visualization
+
+The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .


Suggested change

The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .

The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results.

mashehu · 2024-08-05T08:41:32Z

docs/usage.md

+pyramid_visualization = '[boolean]'
+```
+
+### FORCE configuration


this section can be removed is task.cpus is used instead

nictru

In addition to what @mashehu already said:

Adding FORCE to bioconda would not only allow for more versatile environment definitions in the pipeline, but would also allow users to install your tool without having to compile it. If you need assistance with that, feel free to reach out to me or the #bioconda channel on slack.
The pipeline encodes the information that we usually handle via the meta map as directory and file names. This works, but it is less extendable and harder to debug than the meta map, which allows storing an arbitrary number of named meta fields.

But looks already pretty good!

nictru · 2024-08-05T09:32:45Z

modules/local/force-generate_analysis_mask.nf

+
+    label 'process_single'
+
+    container "docker.io/davidfrantz/force:3.7.10"


Would it be possible for you to add FORCE to bioconda? This is easier than one would think, I added a module with a similar installation process recently via this PR. This way, we could have all installation modalities (conda, singularity, docker) easily available (as bioconda packages are automatically added to biocontainers)

FORCE is not bioinformatics, so is out of scope of bioconda. We are relaxing this requirement for now for non biology pipelines.

nictru · 2024-08-05T09:41:50Z

nextflow.config

-        apptainer.enabled       = false
-        docker.runOptions       = '-u $(id -u):$(id -g)'
+        docker.enabled         = true
+        docker.userEmulation   = true


docker.userEmulation is not supported any more in the latest versions of nextflow - I think it is already not supported in 23.04.0, which is the oldest nextflow version that this pipeline is supposed to run on

nictru · 2024-08-05T09:44:50Z

modules/local/check_results.nf

@@ -0,0 +1,41 @@
+nextflow.enable.dsl = 2
+
+process CHECK_RESULTS {


This process misses a tag

nictru · 2024-08-05T09:58:04Z

modules/local/check_results.nf

+
+    label 'process_low'
+
+    container 'docker.io/rocker/geospatial:4.3.1'


As far as I can see, the only package used from the geospatial image is terra. The corresponding R package is already on conda-forge, so I guess adding it to Bioconda will be redundant. But we can create images using seqera containers, which gave the following:

Docker: community.wave.seqera.io/library/r-terra:1.7-71--57cecb7a052577e0

Singularity: oras://community.wave.seqera.io/library/r-terra:1.7-71--bbada5308a9d09c7

nictru · 2024-08-05T10:09:29Z

modules/local/force-generate_tile_allow_list.nf

+
+    script:
+    """
+    force-tile-extent $aoi tmp/ tile_allow.txt


It will be created by path 'tmp/datacube-definition.prj' (line 11)

nictru · 2024-08-05T10:17:30Z

subworkflows/local/preprocessing.nf

+        ch_versions = ch_versions.mix(FORCE_PREPROCESS.out.versions.first())
+
+        //Group by tile, date and sensor
+        boa_tiles = FORCE_PREPROCESS.out.boa_tiles.flatten().map{ [ "${extractDirectory(it)}_${it.simpleName}", it ] }.groupTuple()


In n-core we usually don't encode information as directory/file names, but instead use a meta map

I'm aware of that. The reasons we decided to maintain the name-encoded information is that it is the common approach in remote sensing and somewhat expected by FORCE. I will look into switching to meta maps.

I think it's fine to encode it in file names if it's common/the standard in the field - I would say it's not a blocker for this release.

But if that's the case, I think it would be important to add validation checks to ensure that the file name structure is exactly as expected for the pipeline.

But of course it wouldn't hurt to copy information like that into a meta.map to accompany the files through the pipeline.

…essing results, some minor adjustments in the docs.

jfy133

Overall really good! You've done a great job at sticking with nf-core structure/guidelines despite the different fields!

A few things I also noticed:

Try to stiick to nf-core guidelines for things such as modules structure, even when they are local
I would highly recommend adding more validation checks to your inlput nextflow schema:
- patterns: to the nextflow_schema to get better user validation (E.g., file suffix checks; or strings with delimiters - regex is your friend)
- exists for all required files
Missing a CHANGELOG update, even if it just says 'first release'
For the modules with loops inside, I strongly recommend as @mashehu pointed out, to consider parallelising these where you can using Nextflow (or at least with bash), otherwise it's not maximising the benefits of the language

P.S. I vaguely remember me commenting about removing MultiQC from somewhere, please ignore it- I just remembered we need it for software version reporting :)

jfy133 · 2024-08-13T06:10:53Z

README.md

+1. Read satellite imagery, digital elevation model, endmember definition, water vapor database and area of interest definition
+2. Generate allow list and analysis mask to determine which pixels from the satellite data can be used
+3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information
+4. Classify pixels by applying linear spectral unmixing
+5. Time series analyses to obtain trends in vegetation dynamics
+6. Create mosaic and pyramid visualizations of the results


Not a requirement, but a diagram would be nice here :) (also helps non-expert reviewers to follow what are meant to be assessing :)

bin/merge_boa.r

jfy133 · 2024-08-13T06:18:18Z

conf/modules.config

+        errorStrategy = 'retry'
+        maxRetries    = 5


Generally stuff like process execution information goes in base.conf @mashehu (as the other reviewer), what do you think here?

The reason why I say this is modules.conf can be more easily overwritten due to config loading order (and this is OK because file naming/locations are more often customisable by a user), whereas stuff like retrying or maxRetry defaults you probably want secure as the 'fall back' behaviour

conf/modules.config

jfy133 · 2024-08-13T06:24:39Z

conf/test.config

+    sensors_level1 = 'LT04,LT05'
+    sensors_level2 = 'LND04 LND05'


Is it correct these have a different delimiter?

Thanks for mentioning that. We actually don't need the first parameter any more. That's a remnant of a prior version of the workflow where sensors_level1 was used in a cli command to download some input data, hence the different delimiter. I will remove the first parameter.

subworkflows/local/preprocessing.nf

jfy133 · 2024-08-13T07:28:45Z

workflows/rangeland.nf

+*/
+
+
+// check wether provided input is within provided time range


Suggested change

// check wether provided input is within provided time range

// check whether provided input is within provided time range

jfy133 · 2024-08-13T07:29:52Z

workflows/rangeland.nf

+    cube_file      = file( "$params.data_cube" )
+    aoi_file       = file( "$params.aoi" )
+    endmember_file = file( "$params.endmember" )


Suggested change

cube_file = file( "$params.data_cube" )

aoi_file = file( "$params.aoi" )

endmember_file = file( "$params.endmember" )

cube_file = file( params.data_cube )

aoi_file = file( params.aoi )

endmember_file = file( params.endmember )

jfy133 · 2024-08-13T07:33:52Z

workflows/rangeland.nf

+        data = base_path.map(it -> file("$it/*/*", type: 'dir')).flatten()
+        data = data.flatten().filter{ inRegion(it) }


Is flatten necessary on both lines?

No, I'll remove the redundancy.

jfy133 · 2024-08-13T07:38:24Z

workflows/rangeland.nf

+    if (params.config_profile_name == 'Test profile') {
+        woody_change_ref      = file("$params.woody_change_ref")
+        woody_yoc_ref         = file("$params.woody_yoc_ref")
+        herbaceous_change_ref = file("$params.herbaceous_change_ref")
+        herbaceous_yoc_ref    = file("$params.herbaceous_yoc_ref")
+        peak_change_ref       = file("$params.peak_change_ref")
+        peak_yoc_ref          = file("$params.peak_yoc_ref")
+
+        CHECK_RESULTS(grouped_trend_data, woody_change_ref, woody_yoc_ref, herbaceous_change_ref, herbaceous_yoc_ref, peak_change_ref, peak_yoc_ref)
+        ch_versions = ch_versions.mix(CHECK_RESULTS.out.versions)
+    }
+
+    if (params.config_profile_name == 'Full test profile') {
+        UNTAR_REF([[:], params.reference])
+        ref_path = UNTAR_REF.out.untar.map(it -> it[1])
+        tar_versions.mix(UNTAR_REF.out.versions)
+
+        CHECK_RESULTS_FULL(grouped_trend_data, ref_path)
+        ch_versions = ch_versions.mix(CHECK_RESULTS_FULL.out.versions)
+    }


You should not embed test specific code within the pipeline itself, (it's not particularly realistic), for this you should add nf-test to the pipeline and use that for a more structured/standardised approach.

A few pipelines now have this (ampliseq, rnaseq, etc.) but if you need pointers let me know.

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Adressing review suggestions for Version 1.0.0

jfy133

General thing, have you formatted all the new local modules/subworkflows/workflow files with the new Nextflow language-server formatter? Mgiht be worth doing this now:

https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow

It will also report errors of things that go against nextflow 'best practise' (not a blocker here if the pipeline is working, but it will reduce errors in teh future as Nextflow updates over time)

jfy133 · 2024-12-09T10:23:12Z

README.md

+3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information
+4. Classify pixels by applying linear spectral unmixing
+5. Time series analyses to obtain trends in vegetation dynamics
+6. Create mosaic and pyramid visualizations of the results


Diagram would still be nice ;)

jfy133 · 2024-12-09T10:23:43Z

README.md

-
-1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+7. Present QC results ([`MultiQC`](http://multiqc.info/))


Do you use custom MultiQC (or evne official modules)? If not maybe just say 'version reporting' or similar

jfy133 · 2024-12-09T10:37:56Z

README.md

+To run the pipeline on real data, input data needs to be acquired.
+Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.


Suggested change

To run the pipeline on real data, input data needs to be acquired.

Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.

To run, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required as input data.

jfy133 · 2024-12-09T10:46:44Z

modules/local/force-preprocess/main.nf

+    label 'process_medium'
+    label 'error_retry'
+
+    container "docker.io/davidfrantz/force:3.7.10"


I wonder if we should make an nf-core of this for security of the pipeline itself... let me check with core.

@ewels says lets double check that it's fully OSS so we can copy, and ideally yes we can make a copy. Maybe you could ask permission from the original author too?

The main thing is we need to make sure for every version update of the container we will also need to make a copy too, but I guess that can be incorporated into the release procedure of your pipeline

jfy133 · 2024-12-09T10:54:08Z

modules/local/force-pyramid/main.nf

+
+    script:
+    """
+    file="*.tif"


Any reason why the *.tif can't be given directly to the command?

jfy133 · 2024-12-09T10:56:55Z

nextflow_schema.json

+                "indexes": {
+                    "type": "string",
+                    "default": "NDVI BLUE GREEN RED NIR SWIR1 SWIR2",
+                    "help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",


Suggested change

"help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",

"help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at [https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html](https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html ) under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",

jfy133 · 2024-12-09T10:57:35Z

nextflow_schema.json

+                    "type": "boolean",
+                    "default": true,
+                    "description": "Publish pipeline outputs.",
+                    "help_text": "Set to `false` to prevent *all* modules from publishing their results.",


I did wonder this above, but what purpose would this serve actually?

A colleague asked me to add this. We do workflow research from the computer science perspective and we sometimes don't care about any outputs and only consider runtime characteristics like resource usage.

jfy133 · 2024-12-09T10:58:47Z

subworkflows/local/higher_level.nf

+        // main processing
+        FORCE_HIGHER_LEVEL( HIGHER_LEVEL_CONFIG.out.higher_level_configs_and_data )
+        ch_versions = ch_versions.mix(FORCE_HIGHER_LEVEL.out.versions.first())
+


Can maybe remove these empty lines around the various trnd_file_* channels

jfy133 · 2024-12-09T11:02:59Z

bin/test.R

@@ -0,0 +1,183 @@
+#!/usr/bin/env Rscript
+
+## Originally written by David Frantz and Felix Kummer and released under the MIT license.


Is this script still ncessary even though you have nf-test?

yes, its executed by our testing modules (for now)

jfy133 · 2024-12-09T11:03:54Z

workflows/rangeland.nf

+
+
+// check whether provided input is within provided time range
+def inRegion = input -> {


I think these custom fuctions need to go inside the workflow RANGELAND { according to the new Nextflow formatting/Linting guidance from the VSCode plugin language server

jfy133

Oops wrong category in previous review, most of my comments were minor - I don't see any major blocker anymore - really good work @Felix-Kummer !

…into dev

Adressing more reviewer comments for 1.0.0

nf-core-bot · 2024-12-16T14:19:52Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Disabled publishing of version.yml in visualization processes

Template update for nf-core/tools version 3.1.1

Felix-Kummer and others added 20 commits October 20, 2023 15:54

Added all workflow files and logic

4c9da5b

Co-authored-by: Fabian Lehmann <[email protected]> Co-authored-by: David Frantz <[email protected]>

Finalized output structure and documentation

bd1195c

Made local modules emit versions

eae59d4

Fixed mosaic output.

e2756fe

Fixed nf-core lint warnings

d0e747c

Moved to a more stable version of FORCE

30d36cb

Fixed publishDir for visualization processes

9beb427

Added version number to nf-validation plugin

9dc069b

Removed unnecessary resource requests in modules.conf in favor of nf-…

841399d

…core labels

fixed docker test failures in certain situations

8c1c9d9

(e.g. nf-core github actions)

Ensured that remaining tests run after a single test failure

6fb5757

Merge branch 'dev' into nf-core-template-merge-2.14.1

edf0b33

Fixed indents in merge scripts

9b17ae7

Merge pull request #6 from nf-core/nf-core-template-merge-2.14.1

4ff325a

Important! Template update for nf-core/tools v2.14.1

Added appropriate raster comparisons for trend tif's to test.R

665e862

Increased robustness of test profile against FORCE-related flakiness

b4e8152

Removed only_tile parameter

f292928

Added new options for higher level processing and visualization.

52c93c9

Added test_full profile

a4d297f

bumped version to 1.0.0

b896dda

mashehu commented Aug 5, 2024

View reviewed changes

mashehu mentioned this pull request Aug 5, 2024

Release 1.0.0 #7

Open

8 tasks

nictru reviewed Aug 5, 2024

View reviewed changes

Felix-Kummer added 3 commits August 7, 2024 15:11

Implemented minor reviewer requests

443ba0a

Added parameters to control publishing behavior for intermediate proc…

3dca391

…essing results, some minor adjustments in the docs.

Added descriptions to custom scripts.

a5df347

jfy133 requested changes Aug 13, 2024

View reviewed changes

Felix-Kummer mentioned this pull request Aug 13, 2024

Adressing review suggestions for Version 1.0.0 #9

Merged

Applied more minor changes requested by the reviewers

2e208fc

Felix-Kummer and others added 12 commits October 26, 2024 09:53

Fixed some linter warnings

415edb4

Replaced for-loops with GNU parallel

0585e5e

Migrated check result modules to nf-test

bd82d81

Added parameter to disable all output publishing

a8da0e3

Fixed log output for preprocessing

e67107d

Added pipeline-level nf-test tests

88ea70f

Aligned run command in usage docs with other pipelines

72996b1

Updated nf-core workflows and modules

ba1d17f

Fix nf-test entry in .gitignore

d8be4ef

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Fixed wrong pipeline version in usage.md

3d64227

Co-authored-by: Matthias Hörtenhuber <[email protected]>

Added concrete boolean values for parameters in usage.md

a651b55

Merge pull request #9 from CRC-FONDA/dev

caee766

Adressing review suggestions for Version 1.0.0

jfy133 reviewed Dec 9, 2024

View reviewed changes

jfy133 self-requested a review December 9, 2024 11:05

jfy133 approved these changes Dec 9, 2024

View reviewed changes

Felix-Kummer added 7 commits December 16, 2024 11:03

Aligned Nextflow files with new language-server formatter

b877f51

Added diagram

7b6d06b

Fixed minor issues in README.md

3a5211c

Fixed minor issues in schema

8f17143

Removed some empty lines

4875e12

Merge branch 'dev' of https://github.com/CRC-FONDA/nf-core-rangeland …

e0377eb

…into dev

Merge pull request #12 from CRC-FONDA/dev

e90a0d1

Adressing more reviewer comments for 1.0.0

Felix-Kummer added 7 commits December 18, 2024 13:16

Improved infrastructure-independence

a7d7c4c

Merge branch 'nf-core:dev' into dev

ee02655

Disabled publishing of version.yml in visualization processes

c534f93

Merge pull request #13 from CRC-FONDA/dev

7c9e49d

Disabled publishing of version.yml in visualization processes

Merge branch 'TEMPLATE' of https://github.com/nf-core/rangeland into dev

ba66df2

Merge branch 'nf-core:dev' into dev

7fd48d3

Merge pull request #15 from CRC-FONDA/dev

bfabfee

Template update for nf-core/tools version 3.1.1

	nextflow run nf-core/rangeland/main.nf \
	nextflow run nf-core/rangeland \

	The default value is 30, as most Landsat satellite natively provide this resolution.
	The default value is `30`, as most Landsat satellite natively provide this resolution.


		### Group size

		The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.


		### Visualization

		The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .


		label 'process_single'

		container "docker.io/davidfrantz/force:3.7.10"

		@@ -0,0 +1,41 @@
		nextflow.enable.dsl = 2

		process CHECK_RESULTS {


		label 'process_low'

		container 'docker.io/rocker/geospatial:4.3.1'

		*/


		// check wether provided input is within provided time range

	// check wether provided input is within provided time range
	// check whether provided input is within provided time range

		data = base_path.map(it -> file("$it//", type: 'dir')).flatten()
		data = data.flatten().filter{ inRegion(it) }

		To run the pipeline on real data, input data needs to be acquired.
		Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.

	To run the pipeline on real data, input data needs to be acquired.
	Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.
	To run, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required as input data.

		@@ -0,0 +1,183 @@
		#!/usr/bin/env Rscript

		## Originally written by David Frantz and Felix Kummer and released under the MIT license.



		// check whether provided input is within provided time range
		def inRegion = input -> {

[Do not merge!] Pseudo PR for first release #8

Are you sure you want to change the base?

[Do not merge!] Pseudo PR for first release #8

Conversation

mashehu commented Aug 5, 2024

github-actions bot commented Aug 5, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

\n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-rangeland_logo_dark.png">\n <img alt="nf-core/rangeland" src="docs/images/nf-core-rangeland_logo_light.png">\n \n

❔ Tests ignored:

✅ Tests passed:

Run details

mashehu left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nictru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfy133 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfy133 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfy133 left a comment

Choose a reason for hiding this comment

nf-core-bot commented Dec 16, 2024

github-actions bot commented Aug 5, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

mashehu left a comment •

edited

Loading