Running this pipeline on AWS Batch #1110

lwtan90 · 2023-11-10T19:16:39Z

Description of feature

This is my first time using Nextflow on RNA-seq analysis, and I find that this pipeline works flawlessly. But, I am trying to use this on AWSBatch, and there isn't a profile made for awsbatch. Can you kindly suggest a way to run it? modify the nextflow.config? Thank you for this great pipeline!

FrankMaiwald · 2023-11-20T14:37:02Z

Hi, Iwtan90, I try to run the pipeline in AWS batch using the profile 'docker'.
This will start the pipeline.

I say 'try to' because in my case the pipeline does not run flawlessly but fails at various early steps with varying error messages,
e.g.
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (sample_name)'
Caused by:
Task failed to start - CannotCreateContainerError: Error response from daemon: devmapper: Thin Pool has 1714 free data blocks which is less than minimum required 4449 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

or

sh: line 1: 24696 Segmentation fault pigz -p 8 -c - > another_sample_name_2_trimmed.fq.gz

I feel that I am not in control of the memory usage of (spot) instances in my AWS batch compute environment other than asking for a specific instance type, e.g. c6a.16xlarge which should have more than enough memory for the trim_galore step. (Instance type 'optimal' also gave the segmentation fault at least once.)

adamrtalbot · 2024-01-08T10:50:46Z

nf-core comes with a default batch profile called awsbatch: https://github.com/nf-core/configs/blob/master/docs/awsbatch.md

adamrtalbot · 2024-01-08T10:52:32Z

@FrankMaiwald you can adjust any and all resources per process: https://nf-co.re/docs/usage/configuration#tuning-workflow-resources

siddharthab · 2024-05-07T16:59:29Z

I hit the same issue. I think the resource requirements on the pipeline are too low, especially for disk. My AWS Compute Environment is configured to use c7a and m7a machine families. On my runs, during the Trim Galore sub workflow, I get c7a.48xlarge machines but with only 30 GB disk, and my processes run out of disk space, which causes the above issues.

siddharthab · 2024-05-07T18:23:21Z

It looks like AWS Batch does not take disk size requirements. The recommended way is to use a launch template in your compute environment. I will see if I can resolve this.

siddharthab · 2024-05-08T03:28:41Z

I tried to increase the disk size in the launch template, but was maybe not doing something right. Instead, I went for the better solution which is scratch-less fusion.

For AWS Batch, you would still need to create a launch template with a user data section. This script wrapped in the MIME format pasted into the user data section of the launch template worked well for us. Of course, we had to restrict our compute environment to use only the *d instances, and configure nextflow to use fusion without scratch.

robsyme · 2024-05-29T16:05:32Z

If a Fusion is not your cup of tea, you might also choose to

increase the boot disk size of your EC2 instances, and/or
use EBS (and potentially increasing the EBS block size)

Let me know if you run into any difficulties there. Happy to help out.

drpatelh · 2024-06-19T08:28:16Z

I will close this for now, as this looks like more of a generic infrastructure issue. Please feel free to join the #rnaseq channel in the nf-core Slack Workspace or the #infrastructure-aws channel in the Nextflow Slack Workspace for more real-time help.

lwtan90 added the enhancement label Nov 10, 2023

drpatelh added this to the 3.15.0 milestone May 13, 2024

drpatelh added question Further information is requested and removed enhancement labels May 13, 2024

drpatelh added the awaiting-response-developers label May 29, 2024

pinin4fjords added awaiting-response-community and removed awaiting-response-developers labels May 29, 2024

drpatelh closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running this pipeline on AWS Batch #1110

Running this pipeline on AWS Batch #1110

lwtan90 commented Nov 10, 2023

FrankMaiwald commented Nov 20, 2023

adamrtalbot commented Jan 8, 2024

adamrtalbot commented Jan 8, 2024

siddharthab commented May 7, 2024 •

edited

Loading

siddharthab commented May 7, 2024

siddharthab commented May 8, 2024

robsyme commented May 29, 2024 •

edited

Loading

drpatelh commented Jun 19, 2024

Running this pipeline on AWS Batch #1110

Running this pipeline on AWS Batch #1110

Comments

lwtan90 commented Nov 10, 2023

Description of feature

FrankMaiwald commented Nov 20, 2023

or

adamrtalbot commented Jan 8, 2024

adamrtalbot commented Jan 8, 2024

siddharthab commented May 7, 2024 • edited Loading

siddharthab commented May 7, 2024

siddharthab commented May 8, 2024

robsyme commented May 29, 2024 • edited Loading

drpatelh commented Jun 19, 2024

siddharthab commented May 7, 2024 •

edited

Loading

robsyme commented May 29, 2024 •

edited

Loading