-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running this pipeline on AWS Batch #1110
Comments
Hi, Iwtan90, I try to run the pipeline in AWS batch using the profile 'docker'. I say 'try to' because in my case the pipeline does not run flawlessly but fails at various early steps with varying error messages, orsh: line 1: 24696 Segmentation fault pigz -p 8 -c - > another_sample_name_2_trimmed.fq.gz I feel that I am not in control of the memory usage of (spot) instances in my AWS batch compute environment other than asking for a specific instance type, e.g. c6a.16xlarge which should have more than enough memory for the trim_galore step. (Instance type 'optimal' also gave the segmentation fault at least once.) |
nf-core comes with a default batch profile called |
@FrankMaiwald you can adjust any and all resources per process: https://nf-co.re/docs/usage/configuration#tuning-workflow-resources |
I hit the same issue. I think the resource requirements on the pipeline are too low, especially for disk. My AWS Compute Environment is configured to use c7a and m7a machine families. On my runs, during the Trim Galore sub workflow, I get c7a.48xlarge machines but with only 30 GB disk, and my processes run out of disk space, which causes the above issues. |
It looks like AWS Batch does not take disk size requirements. The recommended way is to use a launch template in your compute environment. I will see if I can resolve this. |
I tried to increase the disk size in the launch template, but was maybe not doing something right. Instead, I went for the better solution which is scratch-less fusion. For AWS Batch, you would still need to create a launch template with a user data section. This script wrapped in the MIME format pasted into the user data section of the launch template worked well for us. Of course, we had to restrict our compute environment to use only the *d instances, and configure nextflow to use fusion without scratch. |
If a Fusion is not your cup of tea, you might also choose to
Let me know if you run into any difficulties there. Happy to help out. |
I will close this for now, as this looks like more of a generic infrastructure issue. Please feel free to join the #rnaseq channel in the nf-core Slack Workspace or the #infrastructure-aws channel in the Nextflow Slack Workspace for more real-time help. |
Description of feature
This is my first time using Nextflow on RNA-seq analysis, and I find that this pipeline works flawlessly. But, I am trying to use this on AWSBatch, and there isn't a profile made for awsbatch. Can you kindly suggest a way to run it? modify the nextflow.config? Thank you for this great pipeline!
The text was updated successfully, but these errors were encountered: