-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New command/option to download containers for offline usage #3340
Comments
It could be integrated with the |
Why is the |
Because |
You mean because most clusters don't support Docker? Simplifying downloading a Docker image, saving as tar archive, transferring to cluster, and loading the image again would be very welcome I'm sure 😛 |
Yup, could definitely be done - in fact it was originally called |
Interesting, surprising even. |
NB: My memory is terrible 😆 |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This could also help with weird behaviors with singularity when multiple downloads break the execution. One can download all the containers and then run nextflow |
@pditommaso @bentsherman it would be awesome if we could have some sort of native ability to get a list of all the containers used in a given pipeline execution. Given recent changes we have made to accommodate |
I don't think this can be implemented in the current model |
Just to understand better... When using Nextflow run -preview it generates the graph without executing the processes. Can it be that it gets the information about the containers and store it somewhere? Please Paolo be kind with me 😅 |
That's what I was going to suggest, maybe during a preview run or stub run, we could log some information about the processes, like which containers they need. |
This would be wonderful. Let's see if is feasible or not. |
But that is still not enough when the container is defined with a dynamic rule. |
(notes from internal meeting) Even though the container is dynamic, it is almost always defined in terms of variables that would be available during preview mode: container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.17--h00cdaf9_0' :
'biocontainers/samtools:1.17--h00cdaf9_0' }" The Because the container closure is resolved at the task level, we would have to construct a "fake" task config and evaluate the closure in such a way that it throws an exception if the closure can't be resolved. Then we could just report any container rules that couldn't be resolved. But I think in practice, with this approach we would be able to determine every container in an nf-core pipeline. We also discussed using Wave to build the Docker or Singularity image using the |
Note that there are still plenty of edge cases where we have to use a docker image as there is no conda package. Example here, but there are more (with varying levels of complexity). And another nice little edge case that tripped us up this week: overwriting the container from a config file. But hopefully that wouldn't be an issue here with this approach, right? 🤞🏻 |
Indeed, that is why I think we should still try to implement a download command even if we manage to simplify things with Wave. As for the config file, that should be fine. The config file will already be applied to the process config. |
Would it not be reasonable to require to have any package to be delivered as a Conda package? |
No. We try to encourage people to list packages on Conda, but the edge cases often come with software is commercial and provided by a company with a complex EULA / agreement. Then these are not possible to package on Conda. (For example, 10x Genomics / Oxford Nanopore). There have been other situations where conda is not possible as well. |
Download command in Nextflow would be great, but please also have a way to simply print out the container URLs without doing any downloading 🙏🏻 This is much requested by people for a range of reasons, and would also be nice for nf-core where we will likely want to customise the downloads a bit - eg. download locations etc. Could even go crazy and set up some automation to run the command automatically to bundle a file with the pipeline that lists all the container URLs as plain text.. 🤔 |
But this is a requirement for commercial users, not for nf-core community. Is there any nf-core pipeline relying on commercial packages that cannot be redistributed?
Indeed this is the primary goal. |
Yes, several - any using packages from these companies. Off the top of my head that includes at least scrnaseq (10x Genomics), nanoseq (Oxford Nanopore), demultiplex (illumina) and probably quite a few more. Each has a slightly different setup. For example with 10x Genomics (scrnaseq) we have had confirmation that we can build + share our own Docker image manually (they have a manual download process with EULA that cannot be automated), as long as we publicly say that it's not supported or endorsed. With Oxford Nanopore we're explicitly not allowed to do anything, and it's up to the end user to configure the pipeline with a docker image that each user has to set up themselves. Then other pipelines have every other combination of specific setups that you can imagine 😅 It' been like this since the very start of nf-core. From the very first iteration of guidelines we had that docker image was required and that a conda package was recommended for this reason. |
I did it! Tested with rnaseq and sarek so far. Just need to figure out the command line interface and output format. Here's the abridged output from sarek: $ ../launch.sh run nf-core/sarek -profile test --outdir results -preview
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX biocontainers/bwa:0.7.17--hed695b0_7
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM2_INDEX biocontainers/bwa-mem2:2.2.1--he513fc3_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:DRAGMAP_HASHTABLE biocontainers/dragmap:1.2.1--h72d16da_1
NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX biocontainers/bwa:0.7.17--hed695b0_7
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM2_INDEX biocontainers/bwa-mem2:2.2.1--he513fc3_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:DRAGMAP_HASHTABLE biocontainers/dragmap:1.2.1--h72d16da_1
NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:MSISENSORPRO_SCAN biocontainers/msisensor-pro:1.2.0--hfc31af2_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_DBSNP biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_GERMLINE_RESOURCE biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_SNPS biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_INDELS biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_PON biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED biocontainers/gawk:5.1.0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_SPLIT biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_MAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_MAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX biocontainers/bwa:0.7.17--hed695b0_7
NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM2_INDEX biocontainers/bwa-mem2:2.2.1--he513fc3_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:DRAGMAP_HASHTABLE biocontainers/dragmap:1.2.1--h72d16da_1
NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:MSISENSORPRO_SCAN biocontainers/msisensor-pro:1.2.0--hfc31af2_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_DBSNP biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_GERMLINE_RESOURCE biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_SNPS biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_INDELS biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_PON biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED biocontainers/gawk:5.1.0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_SPLIT biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED biocontainers/tabix:1.11--hdfd78af_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_MAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_MAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_MERGE_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:COLLATE_FASTQ_UNMAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:COLLATE_FASTQ_MAP biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:CAT_FASTQ nf-core/ubuntu:20.04
NFCORE_SAREK:SAREK:FASTQC biocontainers/fastqc:0.11.9--0
NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP:BWAMEM1_MEM biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:219b6c272b25e7e642ae3ff0bf0c5c81a5135ab4-0
NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP:BWAMEM2_MEM biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0
NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP:DRAGMAP_ALIGN biocontainers/mulled-v2-580d344d9d4a496cd403932da8765f9e0187774d:5ebebbc128cd624282eaa37d2c7fe01505a91a69-0
NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES biocontainers/mulled-v2-d9e7bad0f7fbc8f4458d5c3ab7ffaaf0235b59fb:f857e2d6cc88d35580d01cf39e0959a68b83c1d9-0
NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:MOSDEPTH biocontainers/mosdepth:0.3.3--hdfd78af_1
NFCORE_SAREK:SAREK:CRAM_TO_BAM biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_BASERECALIBRATOR biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_GATHERBQSRREPORTS biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_APPLYBQSR:GATK4_APPLYBQSR biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:MERGE_CRAM biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:INDEX_CRAM biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:CRAM_QC_RECAL:MOSDEPTH biocontainers/mosdepth:0.3.3--hdfd78af_1
NFCORE_SAREK:SAREK:CRAM_TO_BAM_RECAL biocontainers/samtools:1.17--h00cdaf9_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE biocontainers/strelka:2.9.10--h9ee0642_1
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA_GENOME biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE biocontainers/strelka:2.9.10--h9ee0642_1
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA_GENOME biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:STRELKA_SOMATIC biocontainers/strelka:2.9.10--h9ee0642_1
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:MERGE_STRELKA_INDELS biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:MERGE_STRELKA_SNVS biocontainers/gatk4:4.4.0.0--py36hdfd78af_0
NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:BCFTOOLS_STATS biocontainers/bcftools:1.17--haef29d1_0
NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT biocontainers/vcftools:0.1.16--he513fc3_4
NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_QUAL biocontainers/vcftools:0.1.16--he513fc3_4
NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_SUMMARY biocontainers/vcftools:0.1.16--he513fc3_4
NFCORE_SAREK:SAREK:CUSTOM_DUMPSOFTWAREVERSIONS biocontainers/multiqc:1.14--pyhdfd78af_0
NFCORE_SAREK:SAREK:MULTIQC biocontainers/multiqc:1.14--pyhdfd78af_0 |
This is great @bentsherman ! 🎉 |
What hurts me is that the It's really confusing from a user point of view that config is not including settings define at the process level and there's instead yet another magic run option. Also, it should support Wave. We are going to have soon freeze build, therefore with this option the user could trigger the build of a bunch of containers and then pull them locally |
Indeed, it would be nice for the
And then we would have to keep it up to date with the As for Wave, I thought we confirmed that this preview feature won't trigger the Wave container build. It uses the |
We don't currently have the possibility to get the
container
definitions that are used within any given Nextflow pipeline vianextflow config
or any other existing nextflow command. This is particularly useful especially when you need to pre-download Singularity containers locally so they can be transferred to offline environments before running the pipeline. We have implemented thenf-core download
command to do exactly this but it's a hack because the only way we can currently get all of the container definitions is by writing Python code to physically parse the module main scripts which is fragile (see here).It would be great to have a dedicated
nextflow download
command that fully resolves allcontainer
definitions provided in the pipeline. A couple of obvious options:nextflow download -list nf-core/rnaseq -r 3.9
: List all container definitions in the pipelinenextflow download -fetch nf-core/rnaseq -r 3.9
: Download Singularity images directly ifcontainer
definitions are prefixed byhttps
otherwise convert from Docker to Singularity as done natively by NextflowFor further inspiration, these are the options we currently use in
nf-core download
and these cover pretty much all bases we have encountered so far:The text was updated successfully, but these errors were encountered: