Unable to run on seqera platform on AWS batch with GPU #17

mohmhm1 · 2024-12-09T16:09:25Z

Description of the bug

Running the CHAI_1 pipeline on seqera cloud using AWS yields an error.
Before i ran the pipeline i added

accelerator = 1 to the process to recognize GPU

error is below

Command used and terminal output

Traceback (most recent call last):
  File "/usr/local/bin/run_chai_1.py", line 89, in
    main()
  File "/usr/local/bin/run_chai_1.py", line 77, in main
    run_inference(
  File "/opt/conda/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 348, in run_inference
    return run_folding_on_context(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 438, in run_folding_on_context
    feature_embedding = load_exported("feature_embedding.pt", device)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/chai_lab/chai1.py", line 111, in load_exported
    assert isinstance(device, torch.device)
downloading https://chaiassets.com/chai1-inference-depencencies/models_v2/feature_embedding.pt
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Fusion Info:
    clone_namespace: false
    kernel_version: 4.14
    disk_cache_size: 837Gb
    max_open_files: 1048576
    ami-id: ami-0817f4be8d3c41be4
    instance-id: i-0e3956d1b9462bab1
    instance-type: g4dn.8xlarge
    fusion_version: 2.4.6-5529968

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

drpatelh · 2024-12-10T13:00:02Z

Hi @mohmhm1 ! Thank you for test driving the pipeline!

We will need some more information, including the .nextflow.log file for the run, to help you troubleshoot further. Also, what did the configuration look like for your Compute Environment in Seqera Cloud?

You won't need to add the accelerator directive manually when running the pipeline because this is automatically set via the --use_gpus parameter:

nf-chai/nextflow.config

Lines 45 to 48 in b48ed56

    
           if (params.use_gpus) { 
        
               withName: 'CHAI_1' { 
        
                   accelerator = 1 
        
               }

We have added the pipeline to our community/showcase workspace in Seqera Cloud which is publicly available and should show you exactly how we configured the pipeline to run on AWS Batch via GPUs. I have included links below for reference:

mohmhm1 added the bug Something isn't working label Dec 9, 2024

drpatelh added question Further information is requested and removed bug Something isn't working labels Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run on seqera platform on AWS batch with GPU #17

Unable to run on seqera platform on AWS batch with GPU #17

mohmhm1 commented Dec 9, 2024

drpatelh commented Dec 10, 2024 •

edited

Loading

Unable to run on seqera platform on AWS batch with GPU #17

Unable to run on seqera platform on AWS batch with GPU #17

Comments

mohmhm1 commented Dec 9, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

drpatelh commented Dec 10, 2024 • edited Loading

drpatelh commented Dec 10, 2024 •

edited

Loading