Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run on seqera platform on AWS batch with GPU #17

Open
mohmhm1 opened this issue Dec 9, 2024 · 1 comment
Open

Unable to run on seqera platform on AWS batch with GPU #17

mohmhm1 opened this issue Dec 9, 2024 · 1 comment
Labels
question Further information is requested

Comments

@mohmhm1
Copy link

mohmhm1 commented Dec 9, 2024

Description of the bug

Running the CHAI_1 pipeline on seqera cloud using AWS yields an error.
Before i ran the pipeline i added

accelerator = 1 to the process to recognize GPU

error is below

Command used and terminal output

Traceback (most recent call last):
  File "/usr/local/bin/run_chai_1.py", line 89, in
    main()
  File "/usr/local/bin/run_chai_1.py", line 77, in main
    run_inference(
  File "/opt/conda/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 348, in run_inference
    return run_folding_on_context(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 438, in run_folding_on_context
    feature_embedding = load_exported("feature_embedding.pt", device)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 111, in load_exported
    assert isinstance(device, torch.device)
downloading https://chaiassets.com/chai1-inference-depencencies/models_v2/feature_embedding.pt
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Fusion Info:
    clone_namespace: false
    kernel_version: 4.14
    disk_cache_size: 837Gb
    max_open_files: 1048576
    ami-id: ami-0817f4be8d3c41be4
    instance-id: i-0e3956d1b9462bab1
    instance-type: g4dn.8xlarge
    fusion_version: 2.4.6-5529968

Relevant files

No response

System information

No response

@mohmhm1 mohmhm1 added the bug Something isn't working label Dec 9, 2024
@drpatelh drpatelh added question Further information is requested and removed bug Something isn't working labels Dec 10, 2024
@drpatelh
Copy link
Member

drpatelh commented Dec 10, 2024

Hi @mohmhm1 ! Thank you for test driving the pipeline!

We will need some more information, including the .nextflow.log file for the run, to help you troubleshoot further. Also, what did the configuration look like for your Compute Environment in Seqera Cloud?

You won't need to add the accelerator directive manually when running the pipeline because this is automatically set via the --use_gpus parameter:

nf-chai/nextflow.config

Lines 45 to 48 in b48ed56

if (params.use_gpus) {
withName: 'CHAI_1' {
accelerator = 1
}

We have added the pipeline to our community/showcase workspace in Seqera Cloud which is publicly available and should show you exactly how we configured the pipeline to run on AWS Batch via GPUs. I have included links below for reference:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants