Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running this pipeline on AWS Batch #1110

Closed
lwtan90 opened this issue Nov 10, 2023 · 8 comments
Closed

Running this pipeline on AWS Batch #1110

lwtan90 opened this issue Nov 10, 2023 · 8 comments
Labels
awaiting-response-community question Further information is requested
Milestone

Comments

@lwtan90
Copy link

lwtan90 commented Nov 10, 2023

Description of feature

This is my first time using Nextflow on RNA-seq analysis, and I find that this pipeline works flawlessly. But, I am trying to use this on AWSBatch, and there isn't a profile made for awsbatch. Can you kindly suggest a way to run it? modify the nextflow.config? Thank you for this great pipeline!

@FrankMaiwald
Copy link

Hi, Iwtan90, I try to run the pipeline in AWS batch using the profile 'docker'.
This will start the pipeline.

I say 'try to' because in my case the pipeline does not run flawlessly but fails at various early steps with varying error messages,
e.g.
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (sample_name)'
Caused by:
Task failed to start - CannotCreateContainerError: Error response from daemon: devmapper: Thin Pool has 1714 free data blocks which is less than minimum required 4449 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

or

sh: line 1: 24696 Segmentation fault pigz -p 8 -c - > another_sample_name_2_trimmed.fq.gz

I feel that I am not in control of the memory usage of (spot) instances in my AWS batch compute environment other than asking for a specific instance type, e.g. c6a.16xlarge which should have more than enough memory for the trim_galore step. (Instance type 'optimal' also gave the segmentation fault at least once.)

@adamrtalbot
Copy link
Contributor

nf-core comes with a default batch profile called awsbatch: https://github.com/nf-core/configs/blob/master/docs/awsbatch.md

@adamrtalbot
Copy link
Contributor

@FrankMaiwald you can adjust any and all resources per process: https://nf-co.re/docs/usage/configuration#tuning-workflow-resources

@siddharthab
Copy link
Contributor

siddharthab commented May 7, 2024

I hit the same issue. I think the resource requirements on the pipeline are too low, especially for disk. My AWS Compute Environment is configured to use c7a and m7a machine families. On my runs, during the Trim Galore sub workflow, I get c7a.48xlarge machines but with only 30 GB disk, and my processes run out of disk space, which causes the above issues.

@siddharthab
Copy link
Contributor

It looks like AWS Batch does not take disk size requirements. The recommended way is to use a launch template in your compute environment. I will see if I can resolve this.

@siddharthab
Copy link
Contributor

I tried to increase the disk size in the launch template, but was maybe not doing something right. Instead, I went for the better solution which is scratch-less fusion.

For AWS Batch, you would still need to create a launch template with a user data section. This script wrapped in the MIME format pasted into the user data section of the launch template worked well for us. Of course, we had to restrict our compute environment to use only the *d instances, and configure nextflow to use fusion without scratch.

@drpatelh drpatelh added this to the 3.15.0 milestone May 13, 2024
@drpatelh drpatelh added question Further information is requested and removed enhancement labels May 13, 2024
@robsyme
Copy link
Contributor

robsyme commented May 29, 2024

If a Fusion is not your cup of tea, you might also choose to

  • increase the boot disk size of your EC2 instances, and/or
  • use EBS (and potentially increasing the EBS block size)

Let me know if you run into any difficulties there. Happy to help out.

@drpatelh
Copy link
Member

I will close this for now, as this looks like more of a generic infrastructure issue. Please feel free to join the #rnaseq channel in the nf-core Slack Workspace or the #infrastructure-aws channel in the Nextflow Slack Workspace for more real-time help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-response-community question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants