Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download pipelines for Tower #2247

Merged
merged 43 commits into from
Jun 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
f9d07ef
Add -t / --tower option to 'nf-core download'.
MatthiasZepper Oct 5, 2022
e94dce0
Intermediate commit
MatthiasZepper Dec 5, 2022
cba336b
Implement logic for the Tower download in DownloadWorkflow:download_w…
MatthiasZepper Feb 15, 2023
15c0588
Extend ModulesRepo:setup_local_repo() with a cache_only bool, so we c…
MatthiasZepper Feb 15, 2023
2be88c4
Create WorkflowRepo subclass of ModuleRepo and initialise local clone.
MatthiasZepper Feb 15, 2023
d72169d
TypeError: HEAD is a detached symbolic reference as it points to ...
MatthiasZepper Feb 15, 2023
a6b3492
Split history ./modules/modules_repo.py to synced_repo.py
MatthiasZepper Feb 21, 2023
8984a63
Split history ./modules/modules_repo.py to synced_repo.py
MatthiasZepper Feb 21, 2023
33d0381
Split history ./modules/modules_repo.py to synced_repo.py
MatthiasZepper Feb 21, 2023
caef187
Duplication of ModulesRepo to SyncedRepo done.
MatthiasZepper Feb 21, 2023
2367ae7
Strip ModulesRepo class of the methods moved to new superclass.
MatthiasZepper Feb 21, 2023
f5f0df2
Rebase to current dev.
MatthiasZepper Feb 21, 2023
f852159
Local caching of the repo works now.
MatthiasZepper Feb 21, 2023
af4754e
Started implementing the config download.
MatthiasZepper Feb 22, 2023
3bc97c5
Started to implement the multiple revision selection for the Tower do…
MatthiasZepper Feb 24, 2023
e17a8e9
Rewrite get_revision_hash() function to accomodate multiple revisions.
MatthiasZepper Feb 28, 2023
ecaabf8
The 2nd revivial of the config choice. Now available for archives wit…
MatthiasZepper Feb 28, 2023
6d04ec8
Inclusion of the revision in the output file name is problematic with…
MatthiasZepper Mar 4, 2023
7642e4f
Allow multiple instances of the -r argument. Needed for scripted down…
MatthiasZepper Mar 8, 2023
7f93edb
Finished updating the prompts for the dialogues.
MatthiasZepper Mar 28, 2023
12bf942
Converted the self.wf_download_url into a dict.
MatthiasZepper Apr 13, 2023
2ff62f3
Enable multi-revision classic download.
MatthiasZepper Apr 14, 2023
986f791
Small tweaks to ensure that tools doesn't bail out if there is no sym…
MatthiasZepper Apr 17, 2023
6f95829
Initialise the Git repo clone of the workflow.
MatthiasZepper Apr 18, 2023
760fcaa
WorkflowRepo attributes and functions.
MatthiasZepper Apr 19, 2023
c381776
Finished the Tower download branch.
MatthiasZepper Apr 21, 2023
526a26e
Minor tweaks to the container download functionality.
MatthiasZepper Apr 24, 2023
f4b9e67
Updating docs and changelog, fixing linting errors.
MatthiasZepper Apr 24, 2023
2bf14bd
Hopefully fixed the existing tests. New ones still need to be written.
MatthiasZepper Apr 24, 2023
8de588e
Refactor the CLI commands for the Singularity Cache Dir
MatthiasZepper Apr 25, 2023
d729bde
Readme updates for the new remote Singularity cache feature.
MatthiasZepper Apr 26, 2023
0f58c29
Add interactive check in retry for parsing the index.
MatthiasZepper Apr 27, 2023
6294d74
Incorporating some suggestions by @mashehu.
MatthiasZepper Apr 27, 2023
8d327a4
Apply suggestions from code review @mashehu
MatthiasZepper Apr 27, 2023
340c519
Writing additional tests for the --tower download functionality.
MatthiasZepper Apr 27, 2023
f599237
Move alterations from Version 2.8 (which this PR didn't make anymore)…
MatthiasZepper May 2, 2023
2518a4b
Adding the info about remote containers to the summary log rather tha…
MatthiasZepper May 2, 2023
4f390be
Moved the notification about remote containers to summary_log.
MatthiasZepper May 5, 2023
f8e5068
Apply suggestions from code review
MatthiasZepper May 9, 2023
e512878
Fixes suggested by @mirpedrol during review. Thanks!
MatthiasZepper May 9, 2023
315b9a3
@mashehu suggested that downloading the containers should not be opti…
MatthiasZepper May 9, 2023
6a806ee
Bugfix: WorkflowRepo.tidy_tags() did indeed only tidy tags. However, …
MatthiasZepper May 26, 2023
259afa6
Merge branch 'dev' into DownloadForTower
MatthiasZepper May 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pytest-frozen-ubuntu-20.04.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ concurrency:
cancel-in-progress: true

jobs:
pytest:
pytest-frozen:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
- Remove `aws_tower` profile ([#2287])(https://github.com/nf-core/tools/pull/2287)
- Fixed the Slack report to include the pipeline name ([#2291](https://github.com/nf-core/tools/pull/2291))

### Download

- Introduce a `--tower` flag for `nf-core download` to obtain pipelines in an offline format suited for [seqeralabs® Nextflow Tower](https://cloud.tower.nf/) ([#2247](https://github.com/nf-core/tools/pull/2247)).
- Refactored the CLI for `--singularity-cache` in `nf-core download` from a flag to an argument. The prior options were renamed to `amend` (container images are only saved in the `$NXF_SINGULARITY_CACHEDIR`) and `copy` (a copy of the image is saved with the download). `remote` was newly introduced and allows to provide a table of contents of a remote cache via an additional argument `--singularity-cache-index` ([#2247](https://github.com/nf-core/tools/pull/2247)).

### Linting

- Warn if container access is denied ([#2270](https://github.com/nf-core/tools/pull/2270))
Expand Down
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ A python package with helper tools for the nf-core community.
- [`nf-core` tools update](#update-tools)
- [`nf-core list` - List available pipelines](#listing-pipelines)
- [`nf-core launch` - Run a pipeline with interactive parameter prompts](#launch-a-pipeline)
- [`nf-core download` - Download pipeline for offline use](#downloading-pipelines-for-offline-use)
- [`nf-core download` - Download a pipeline for offline use](#downloading-pipelines-for-offline-use)
- [`nf-core licences` - List software licences in a pipeline](#pipeline-software-licences)
- [`nf-core create` - Create a new pipeline with the nf-core template](#creating-a-new-pipeline)
- [`nf-core lint` - Check pipeline code against nf-core guidelines](#linting-a-workflow)
Expand Down Expand Up @@ -348,13 +348,13 @@ nextflow run /path/to/download/nf-core-rnaseq-dev/workflow/ --input mydata.csv -
### Downloaded nf-core configs

The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline.
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs).
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs). This option is not available when downloading a pipeline for use with [Nextflow Tower](#adapting-downloads-to-nextflow-tower) because the application manages all configurations separately.

### Downloading singularity containers

If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you.
To do this, select `singularity` in the prompt or specify `--container singularity` in the command.
Your archive / target output directory will then include three folders: `workflow`, `configs` and also `singularity-containers`.
Your archive / target output directory will then also include a separate folder `singularity-containers`.

The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file:

Expand All @@ -372,10 +372,9 @@ We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable
If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory.
Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.

If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--singularity-cache-only` / `--singularity-cache-copy`.
If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--singularity-cache amend`. This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.

This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory.
The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.
If you are downloading a workflow for a different system, you can provide information about its image cache to `nf-core download`. To avoid unnecessary container image downloads, choose `--singularity-cache remote` and provide a list of already available images as plain text file to `--singularity-cache-index my_list_of_remotely_available_images.txt`. To generate this list on the remote system, run `find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt`. The tool will then only download and copy images into your output directory, which are missing on the remote system.

#### How the Singularity image downloads work

Expand All @@ -391,16 +390,22 @@ Where both are found, the download URL is preferred.

Once a full list of containers is found, they are processed in the following order:

1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache-only` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache-only` is _not_ specified, they are copied to the output directory
1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache amend` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache copy` is specified, they are copied to the output directory
3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`)
4. If they look like a Docker image name, they are fetched using a `singularity pull` command
- This requires Singularity to be installed on the system and is substantially slower
- This requires Singularity/Apptainer to be installed on the system and is substantially slower

Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images.
Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images that are copied to the output directory.

If the download speeds are much slower than your internet connection is capable of, you can set `--parallel-downloads` to a large number to download loads of images at once.

### Adapting downloads to Nextflow Tower

[seqeralabs® Nextflow Tower](https://cloud.tower.nf/) provides a graphical user interface to oversee pipeline runs, gather statistics and configure compute resources. While pipelines added to _Tower_ are preferably hosted at a Git service, providing them as disconnected, self-reliant repositories is also possible for premises with restricted network access. Choosing the `--tower` flag will download the pipeline in an appropriate form.

Subsequently, the `*.git` folder can be moved to it's final destination and linked with a pipeline in _Tower_ using the `file:/` prefix.

## Pipeline software licences

Sometimes it's useful to see the software licences of the tools used in a pipeline.
Expand Down
44 changes: 39 additions & 5 deletions nf_core/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,21 +209,46 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all
# nf-core download
@nf_core_cli.command()
@click.argument("pipeline", required=False, metavar="<pipeline name>")
@click.option("-r", "--revision", type=str, help="Pipeline release")
@click.option(
"-r",
"--revision",
multiple=True,
help="Pipeline release to download. Multiple invocations are possible, e.g. `-r 1.1 -r 1.2`",
)
@click.option("-o", "--outdir", type=str, help="Output directory")
@click.option(
"-x", "--compress", type=click.Choice(["tar.gz", "tar.bz2", "zip", "none"]), help="Archive compression type"
)
@click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files")
@click.option("-t", "--tower", is_flag=True, default=False, help="Download for seqeralabs® Nextflow Tower")
@click.option(
"-c", "--container", type=click.Choice(["none", "singularity"]), help="Download software container images"
)
@click.option(
"--singularity-cache-only/--singularity-cache-copy",
help="Don't / do copy images to the output directory and set 'singularity.cacheDir' in workflow",
"-s",
"--singularity-cache",
type=click.Choice(["amend", "copy", "remote"]),
help="Utilize the 'singularity.cacheDir' in the download process, if applicable.",
)
@click.option(
"-i",
"--singularity-cache-index",
type=str,
help="List of images already available in a remote 'singularity.cacheDir', imposes --singularity-cache=remote",
)
@click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads")
def download(pipeline, revision, outdir, compress, force, container, singularity_cache_only, parallel_downloads):
def download(
pipeline,
revision,
outdir,
compress,
force,
tower,
container,
singularity_cache,
singularity_cache_index,
parallel_downloads,
):
"""
Download a pipeline, nf-core/configs and pipeline singularity images.

Expand All @@ -233,7 +258,16 @@ def download(pipeline, revision, outdir, compress, force, container, singularity
from nf_core.download import DownloadWorkflow

dl = DownloadWorkflow(
pipeline, revision, outdir, compress, force, container, singularity_cache_only, parallel_downloads
pipeline,
revision,
outdir,
compress,
force,
tower,
container,
singularity_cache,
singularity_cache_index,
parallel_downloads,
)
dl.download_workflow()

Expand Down
Loading