Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do not merge!] Pseudo PR for first release #8

Open
wants to merge 79 commits into
base: TEMPLATE
Choose a base branch
from
Open

[Do not merge!] Pseudo PR for first release #8

wants to merge 79 commits into from

Conversation

mashehu
Copy link

@mashehu mashehu commented Aug 5, 2024

Do not merge! This is a PR of dev compared to first release for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged into first-commit-for-pseudo-pr!

Copy link

github-actions bot commented Aug 5, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit bfabfee

+| ✅ 203 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   5 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 1.0.0
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-rangeland_logo_dark.png">\n <img alt="nf-core/rangeland" src="docs/images/nf-core-rangeland_logo_light.png">\n \n

    GitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/rangeland is a geographical best-practice analysis pipeline for remotely sensed imagery.\nThe pipeline processes satellite imagery alongside auxiliary data in multiple steps to arrive at a set of trend files related to land-cover changes. The main pipeline steps are:\n\n1. Read satellite imagery, digital elevation model (dem), endmember definition, water vapor database (wvdb), datacube definition and area of interest definition (aoi)\n2. Generate allow list and analysis mask to determine which pixels from the satellite data can be used\n3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information (aka. level 2 analysis read data)\n4. Merge spatially and temporally overlapping preprocessed data\n5. Classify pixels by applying linear spectral unmixing\n6. Time series analyses to obtain trends in vegetation dynamics to derive level 3 data\n7. Create mosaic and pyramid visualizations of the results\n8. Version reporting with MultiQC (MultiQC)\n\n<p align="center">\n <img title="nf-core/rangeland diagram" src="docs/images/rangeland_diagram.png" width=95%>\n

    \n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test before running the workflow on actual data.\n\nTo run, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required as input data.\nPlease refer to the usage documentation for details on the input structure.\n\nNow, you can run the pipeline using:\n\nbash\nnextflow run nf-core/rangeland \\\n -profile <docker/singularity/.../institute> \\\n --input <SATELLITE IMAGES> \\\n --dem <DIGITAL ELEVATION MODEL> \\\n --wvdb <WATER VAPOR DATA> \\\n --data_cube <DATA CUBE> \\\n --aoi <AREA OF INTEREST> \\\n --endmember <ENDMEMBER SPECIFICATION> \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nThe rangeland workflow was originally written by:\n\n- Fabian Lehmann\n- David Frantz\n\nThe original workflow can be found on github.\n\nTransformation to nf-core/rangeland was conducted by Felix Kummer.\nnf-core alignment started on the nf-core branch of the original repository.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Fabian Lehmann\n- Katarzyna Ewa Lewinska.\n\n## Acknowledgements\n\nThis pipeline was developed and aligned with nf-core as part of the Foundations of Workflows for Large-Scale Scientific Data Analysis (FONDA) initiative.\n\nFONDA\n\nFONDA can be cited as follows:\n\n> The Collaborative Research Center FONDA.\n>\n> Ulf Leser, Marcus Hilbrich, Claudia Draxl, Peter Eisert, Lars Grunske, Patrick Hostert, Dagmar Kainm\u00fcller, Odej Kao, Birte Kehr, Timo Kehrer, Christoph Koch, Volker Markl, Henning Meyerhenke, Tilmann Rabl, Alexander Reinefeld, Knut Reinert, Kerstin Ritter, Bj\u00f6rn Scheuermann, Florian Schintke, Nicole Schweikardt, Matthias Weidlich.\n>\n> Datenbank Spektrum 2021 doi: 10.1007/s13222-021-00397-5\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #rangeland channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/rangeland for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n\nThis pipeline is based one the publication listed below.\nThe publication can be cited as follows:\n\n> FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters\n>\n> Lehmann, F., Frantz, D., Becker, S., Leser, U., Hostert, P. (2021). FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters. In CIKM Workshops.\n",
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config

✅ Tests passed:

Run details

  • nf-core/tools version 3.1.1
  • Run at 2025-01-03 12:33:24

Copy link
Author

@mashehu mashehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! Almost there 🤏🏻

I think an option to resolve symlinks and copy the actual files would be good for reproducibility
I have a feeling the modules could rely more on the strengths of nextflow, e.g. many have for loops over files, these should be seperate nextflow jobs imo.

README.md Outdated
```bash
nextflow run nf-core/rangeland \
nextflow run nf-core/rangeland/main.nf \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nextflow run nf-core/rangeland/main.nf \
nextflow run nf-core/rangeland \

bin/merge_boa.r Outdated
Comment on lines 2 to 14

args = commandArgs(trailingOnly=TRUE)


if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}

fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)

require(raster)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args = commandArgs(trailingOnly=TRUE)
if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}
fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)
require(raster)
require(raster)
args = commandArgs(trailingOnly=TRUE)
if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}
fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)

at least in genomics it is standard the load the libraries at the beginning of an R script.

bin/merge_boa.r Outdated
Comment on lines 25 to 34
for (i in 1:nf){

data <- brick(finp[i])[]

num <- num + !is.na(data)

data[is.na(data)] <- 0
sum <- sum + data

}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how large is nf here usually? for larger nf try to use apply instead of a for-loop to improve the performance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highly depends on the type, size and overlap of the pipeline's input data. It may become >100 for some extreme cases, but for our currently used data its usually between 5 and 20. The merge scripts are mostly untouched from the previous (not nf-core) installation of this pipeline. I will rework them and also include the other changes you suggested.

bin/merge_qai.r Outdated
Comment on lines 2 to 14

args = commandArgs(trailingOnly=TRUE)


if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}

fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)

require(raster)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args = commandArgs(trailingOnly=TRUE)
if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}
fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)
require(raster)
require(raster)
args = commandArgs(trailingOnly=TRUE)
if (length(args) < 3) {
stop("\nthis program needs at least 3 inputs\n1: output filename\n2-*: input files", call.=FALSE)
}
fout <- args[1]
finp <- args[2:length(args)]
nf <- length(finp)

bin/merge_boa.r Show resolved Hide resolved
docs/usage.md Outdated
--resolution '[integer]'
```

The default value is 30, as most Landsat satellite natively provide this resolution.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default value is 30, as most Landsat satellite natively provide this resolution.
The default value is `30`, as most Landsat satellite natively provide this resolution.

docs/usage.md Outdated
--end_date '[YYYY-MM-DD]'
```

Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Default values are `'1984-01-01'` for the start date and `'2006-12-31'` for the end date.

we show the default values on the parameters page, so easier to keep the docs in sync, by only having them in one place (if no further explanation of the choice of default values is given)

docs/usage.md Outdated

### Group size

The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to 100.
The `group_size` parameters can be ignored in most cases. It defines how many satellite scenes are processed together. The parameters is used to balance the tradeoff between I/O and computational capacities on individual compute nodes. By default, `group_size` is set to `100`.

docs/usage.md Outdated

### Visualization

The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results (this is the default behavior, make sure to not disable mosaic when using test profiles) .
The workflow provides two types of results visualization and aggregation. The fine grained mosaic visualization contains all time series analyses results for all tiles in the original resolution. Pyramid visualizations present a broad overview of the same data but at a lower resolution. Both visualizations can be enabled or disabled using the parameters `mosaic_visualization` and `pyramid_visualization`. By default, both visualization methods are enabled. Note that the mosaic visualization is required to be enabled when using the `test` and `test_full` profiles to allow the pipeline to check the correctness of its results.

docs/usage.md Outdated
pyramid_visualization = '[boolean]'
```

### FORCE configuration
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section can be removed is task.cpus is used instead

@mashehu mashehu mentioned this pull request Aug 5, 2024
8 tasks
Copy link

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to what @mashehu already said:

  • Adding FORCE to bioconda would not only allow for more versatile environment definitions in the pipeline, but would also allow users to install your tool without having to compile it. If you need assistance with that, feel free to reach out to me or the #bioconda channel on slack.
  • The pipeline encodes the information that we usually handle via the meta map as directory and file names. This works, but it is less extendable and harder to debug than the meta map, which allows storing an arbitrary number of named meta fields.

But looks already pretty good!


label 'process_single'

container "docker.io/davidfrantz/force:3.7.10"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible for you to add FORCE to bioconda? This is easier than one would think, I added a module with a similar installation process recently via this PR. This way, we could have all installation modalities (conda, singularity, docker) easily available (as bioconda packages are automatically added to biocontainers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FORCE is not bioinformatics, so is out of scope of bioconda. We are relaxing this requirement for now for non biology pipelines.

nextflow.config Outdated
apptainer.enabled = false
docker.runOptions = '-u $(id -u):$(id -g)'
docker.enabled = true
docker.userEmulation = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker.userEmulation is not supported any more in the latest versions of nextflow - I think it is already not supported in 23.04.0, which is the oldest nextflow version that this pipeline is supposed to run on

@@ -0,0 +1,41 @@
nextflow.enable.dsl = 2

process CHECK_RESULTS {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process misses a tag


label 'process_low'

container 'docker.io/rocker/geospatial:4.3.1'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the only package used from the geospatial image is terra. The corresponding R package is already on conda-forge, so I guess adding it to Bioconda will be redundant. But we can create images using seqera containers, which gave the following:

  • Docker: community.wave.seqera.io/library/r-terra:1.7-71--57cecb7a052577e0
  • Singularity: oras://community.wave.seqera.io/library/r-terra:1.7-71--bbada5308a9d09c7


script:
"""
force-tile-extent $aoi tmp/ tile_allow.txt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be created by path 'tmp/datacube-definition.prj' (line 11)

ch_versions = ch_versions.mix(FORCE_PREPROCESS.out.versions.first())

//Group by tile, date and sensor
boa_tiles = FORCE_PREPROCESS.out.boa_tiles.flatten().map{ [ "${extractDirectory(it)}_${it.simpleName}", it ] }.groupTuple()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In n-core we usually don't encode information as directory/file names, but instead use a meta map

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of that. The reasons we decided to maintain the name-encoded information is that it is the common approach in remote sensing and somewhat expected by FORCE. I will look into switching to meta maps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to encode it in file names if it's common/the standard in the field - I would say it's not a blocker for this release.

But if that's the case, I think it would be important to add validation checks to ensure that the file name structure is exactly as expected for the pipeline.

But of course it wouldn't hurt to copy information like that into a meta.map to accompany the files through the pipeline.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really good! You've done a great job at sticking with nf-core structure/guidelines despite the different fields!

A few things I also noticed:

  • Try to stiick to nf-core guidelines for things such as modules structure, even when they are local

  • I would highly recommend adding more validation checks to your inlput nextflow schema:

    • patterns: to the nextflow_schema to get better user validation (E.g., file suffix checks; or strings with delimiters - regex is your friend)
    • exists for all required files
  • Missing a CHANGELOG update, even if it just says 'first release'

  • For the modules with loops inside, I strongly recommend as @mashehu pointed out, to consider parallelising these where you can using Nextflow (or at least with bash), otherwise it's not maximising the benefits of the language

    P.S. I vaguely remember me commenting about removing MultiQC from somewhere, please ignore it- I just remembered we need it for software version reporting :)

README.md Outdated
Comment on lines 24 to 29
1. Read satellite imagery, digital elevation model, endmember definition, water vapor database and area of interest definition
2. Generate allow list and analysis mask to determine which pixels from the satellite data can be used
3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information
4. Classify pixels by applying linear spectral unmixing
5. Time series analyses to obtain trends in vegetation dynamics
6. Create mosaic and pyramid visualizations of the results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a requirement, but a diagram would be nice here :) (also helps non-expert reviewers to follow what are meant to be assessing :)

bin/merge_boa.r Show resolved Hide resolved
Comment on lines 38 to 39
errorStrategy = 'retry'
maxRetries = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally stuff like process execution information goes in base.conf @mashehu (as the other reviewer), what do you think here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I say this is modules.conf can be more easily overwritten due to config loading order (and this is OK because file naming/locations are more often customisable by a user), whereas stuff like retrying or maxRetry defaults you probably want secure as the 'fall back' behaviour

conf/modules.config Outdated Show resolved Hide resolved
conf/test.config Outdated
Comment on lines 39 to 40
sensors_level1 = 'LT04,LT05'
sensors_level2 = 'LND04 LND05'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct these have a different delimiter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning that. We actually don't need the first parameter any more. That's a remnant of a prior version of the workflow where sensors_level1 was used in a cli command to download some input data, hence the different delimiter. I will remove the first parameter.

subworkflows/local/preprocessing.nf Show resolved Hide resolved
*/


// check wether provided input is within provided time range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check wether provided input is within provided time range
// check whether provided input is within provided time range

Comment on lines 78 to 80
cube_file = file( "$params.data_cube" )
aoi_file = file( "$params.aoi" )
endmember_file = file( "$params.endmember" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cube_file = file( "$params.data_cube" )
aoi_file = file( "$params.aoi" )
endmember_file = file( "$params.endmember" )
cube_file = file( params.data_cube )
aoi_file = file( params.aoi )
endmember_file = file( params.endmember )

Comment on lines 90 to 91
data = base_path.map(it -> file("$it/*/*", type: 'dir')).flatten()
data = data.flatten().filter{ inRegion(it) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is flatten necessary on both lines?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'll remove the redundancy.

Comment on lines 135 to 154
if (params.config_profile_name == 'Test profile') {
woody_change_ref = file("$params.woody_change_ref")
woody_yoc_ref = file("$params.woody_yoc_ref")
herbaceous_change_ref = file("$params.herbaceous_change_ref")
herbaceous_yoc_ref = file("$params.herbaceous_yoc_ref")
peak_change_ref = file("$params.peak_change_ref")
peak_yoc_ref = file("$params.peak_yoc_ref")

CHECK_RESULTS(grouped_trend_data, woody_change_ref, woody_yoc_ref, herbaceous_change_ref, herbaceous_yoc_ref, peak_change_ref, peak_yoc_ref)
ch_versions = ch_versions.mix(CHECK_RESULTS.out.versions)
}

if (params.config_profile_name == 'Full test profile') {
UNTAR_REF([[:], params.reference])
ref_path = UNTAR_REF.out.untar.map(it -> it[1])
tar_versions.mix(UNTAR_REF.out.versions)

CHECK_RESULTS_FULL(grouped_trend_data, ref_path)
ch_versions = ch_versions.mix(CHECK_RESULTS_FULL.out.versions)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not embed test specific code within the pipeline itself, (it's not particularly realistic), for this you should add nf-test to the pipeline and use that for a more structured/standardised approach.

A few pipelines now have this (ampliseq, rnaseq, etc.) but if you need pointers let me know.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General thing, have you formatted all the new local modules/subworkflows/workflow files with the new Nextflow language-server formatter? Mgiht be worth doing this now:

https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow

It will also report errors of things that go against nextflow 'best practise' (not a blocker here if the pipeline is working, but it will reduce errors in teh future as Nextflow updates over time)

README.md Outdated
3. Preprocess data to obtain atmospherically corrected images alongside quality assurance information
4. Classify pixels by applying linear spectral unmixing
5. Time series analyses to obtain trends in vegetation dynamics
6. Create mosaic and pyramid visualizations of the results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagram would still be nice ;)

README.md Outdated

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
7. Present QC results ([`MultiQC`](http://multiqc.info/))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use custom MultiQC (or evne official modules)? If not maybe just say 'version reporting' or similar

README.md Outdated
Comment on lines 40 to 41
To run the pipeline on real data, input data needs to be acquired.
Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run the pipeline on real data, input data needs to be acquired.
Concretely, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required.
To run, satellite imagery, water vapor data, a digital elevation model, endmember definitions, a datacube specification, and a area-of-interest specification are required as input data.

label 'process_medium'
label 'error_retry'

container "docker.io/davidfrantz/force:3.7.10"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make an nf-core of this for security of the pipeline itself... let me check with core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels says lets double check that it's fully OSS so we can copy, and ideally yes we can make a copy. Maybe you could ask permission from the original author too?

The main thing is we need to make sure for every version update of the container we will also need to make a copy too, but I guess that can be incorporated into the release procedure of your pipeline


script:
"""
file="*.tif"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why the *.tif can't be given directly to the command?

"indexes": {
"type": "string",
"default": "NDVI BLUE GREEN RED NIR SWIR1 SWIR2",
"help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",
"help_text": "Space-separated list of indexes and bands that should be considered in time series analyses. They are indicated by using their established abbreviations. The full list of available indexes is available at [https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html](https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/param.html ) under the 'INDEX' parameter. Spectral unmixing is a special index and always activated.",

"type": "boolean",
"default": true,
"description": "Publish pipeline outputs.",
"help_text": "Set to `false` to prevent *all* modules from publishing their results.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder this above, but what purpose would this serve actually?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A colleague asked me to add this. We do workflow research from the computer science perspective and we sometimes don't care about any outputs and only consider runtime characteristics like resource usage.

// main processing
FORCE_HIGHER_LEVEL( HIGHER_LEVEL_CONFIG.out.higher_level_configs_and_data )
ch_versions = ch_versions.mix(FORCE_HIGHER_LEVEL.out.versions.first())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can maybe remove these empty lines around the various trnd_file_* channels

@@ -0,0 +1,183 @@
#!/usr/bin/env Rscript

## Originally written by David Frantz and Felix Kummer and released under the MIT license.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script still ncessary even though you have nf-test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, its executed by our testing modules (for now)



// check whether provided input is within provided time range
def inRegion = input -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these custom fuctions need to go inside the workflow RANGELAND { according to the new Nextflow formatting/Linting guidance from the VSCode plugin language server

@jfy133 jfy133 self-requested a review December 9, 2024 11:05
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops wrong category in previous review, most of my comments were minor - I don't see any major blocker anymore - really good work @Felix-Kummer !

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants