Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev -> Main for 0.2.0 release #16

Merged
merged 49 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
39e3d52
Exposed additional chai-1 parameters to Nextflow.
FloWuenne Nov 28, 2024
bd491fb
Updated changelog.
FloWuenne Nov 28, 2024
bab3a0c
Fix changelog formatting.
FloWuenne Nov 28, 2024
dceee4c
Fixed params definitions for CHAI_1 process.
FloWuenne Nov 28, 2024
6563356
Add log for GPU/CPU
adamrtalbot Nov 28, 2024
3f059cc
Apply suggestions from code review
drpatelh Nov 28, 2024
33f0a0e
fix: update params being passed to NF_CHAI workflow
drpatelh Nov 28, 2024
ec0a0f5
[chore]: update CHANGELOG
drpatelh Nov 28, 2024
de649d7
Merge pull request #11 from FloWuenne/add_chai1_params
drpatelh Nov 28, 2024
39d1a36
Merge pull request #12 from adamrtalbot/patch-1
drpatelh Nov 28, 2024
e443420
chore: fix entries in CHANGELOG
drpatelh Nov 28, 2024
a37ad6b
chore: bump pipeline version to 0.2.0
drpatelh Nov 28, 2024
ce0143b
chore: fix entries in CHANGELOG
drpatelh Nov 28, 2024
00d2bb5
docs: make POC language more explicit
drpatelh Nov 28, 2024
dd940d5
feat: bump chai_lab version to 0.4.2
drpatelh Nov 28, 2024
c58c4bb
feat: bump container for chai_lab 0.4.2
drpatelh Nov 28, 2024
668cd9e
chore: update CHANGELOG
drpatelh Nov 28, 2024
213164d
Merge pull request #13 from seqeralabs/bump_chai
drpatelh Nov 28, 2024
1739592
feat: Added msa parameter.
FloWuenne Nov 29, 2024
489727f
Fix: Remove local path from config.
FloWuenne Nov 29, 2024
4723a3d
Fix: Added CHAI_DOWNLOADS_DIR back into module.
FloWuenne Nov 29, 2024
6f6a137
fix: Removed resource label from CHAI_1 and added groundswell optimiz…
FloWuenne Nov 29, 2024
75e9ed9
fix: Added apptainer gpu config runOption.
FloWuenne Nov 29, 2024
0022e02
fix: set output-dir param to meta.id again
FloWuenne Nov 29, 2024
bd2aa5e
fix: changed torch device definition back, to work on cpu only machines.
FloWuenne Nov 29, 2024
0056f46
fix: Fixed indentation in config.
FloWuenne Nov 29, 2024
3744488
fix: Added Path to msa_dir in run_chai_1.py
FloWuenne Nov 29, 2024
f8208e4
fix: Fixed indentation and quotes for defs in CHAI_1
FloWuenne Nov 29, 2024
9f21620
fix: Add exception in run_chai_1.py for case that msa is not provided.
FloWuenne Nov 29, 2024
67b5b99
fix: Updated nextflow_schema.json.
FloWuenne Nov 30, 2024
14db704
chore: Updated changelog.
FloWuenne Nov 30, 2024
33c0d85
fix: Fixed left padding in nextflow.config
FloWuenne Nov 30, 2024
915fe50
fix: Fixed msa_dir param input for run_chai_1.py
FloWuenne Nov 30, 2024
80b206b
chore: update CHANGELOG
drpatelh Dec 2, 2024
48665f5
chore: rename underscores to dashes in Python script for consistency
drpatelh Dec 2, 2024
816b2b6
chore: move --msa_dir param up in schema and add appropriate fields
drpatelh Dec 2, 2024
7ebaddb
chore: move --msa_dir param up in parameter priority as input
drpatelh Dec 2, 2024
a0559bc
fix: bug in fasta file name
drpatelh Dec 2, 2024
9b40231
chore: change some variable names in main module
drpatelh Dec 2, 2024
9c5bc4a
fix: revert removal of process_high label
drpatelh Dec 2, 2024
3b26c20
docs: add sentence about --msa_dir to main README
drpatelh Dec 2, 2024
17a6f52
Merge pull request #14 from seqeralabs/add_msa_param
drpatelh Dec 2, 2024
5711870
chore: move test data for pipeline into own folder
drpatelh Dec 2, 2024
df104d2
feat: add new test data for MSAs
drpatelh Dec 2, 2024
b24624b
fix: update paths for test data in configs
drpatelh Dec 2, 2024
5c1c520
feat: add test_full_msa profile
drpatelh Dec 2, 2024
ac88955
chore: update CHANGELOG
drpatelh Dec 2, 2024
cbd9677
docs: add sources for test data
drpatelh Dec 3, 2024
191da24
Merge pull request #15 from seqeralabs/add_msa_test_profile
drpatelh Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,25 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0.0dev - [date]
## 0.2.0

### Credits

Special thanks to the following for their contributions to the release:

- [Adam Talbot](https://github.com/adamrtalbot)
- [Esha Joshi](https://github.com/ejseqera)
- [Florian Wuennemann](https://github.com/FloWuenne)

### Enhancements & fixes

- [PR #11](https://github.com/seqeralabs/nf-chai/pull/11) - Expose additional Chai-1 parameters in the pipeline
- [PR #12](https://github.com/seqeralabs/nf-chai/pull/12) - Add log for GPU/CPU
- [PR #13](https://github.com/seqeralabs/nf-chai/pull/13) - Bump `chai_lab` version to 0.4.2
- [PR #14](https://github.com/seqeralabs/nf-chai/pull/14) - Add parameter to provide multiple sequence alignment directory to Chai-1
- [PR #15](https://github.com/seqeralabs/nf-chai/pull/15) - Add `test_full_msa` profile to test provision of MSAs

## 0.1.0

Initial release of seqeralabs/nf-chai, created with the [nf-core](https://nf-co.re/) template.

Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/seqeralabs/nf-chai)

## POC implementation of Chai-1 in Nextflow

## Introduction

**nf-chai** is a bioinformatics pipeline for running the [Chai-1](https://github.com/chaidiscovery/chai-lab) protein prediction algorithm on an input set of protein sequences in FASTA format. The pipeline has been written in Nextflow to generate results for downstream analysis in a reproducible, scalable and portable way.
**nf-chai** is a simple, proof-of-concept bioinformatics pipeline for running the [Chai-1](https://github.com/chaidiscovery/chai-lab) protein prediction algorithm on an input set of protein sequences in FASTA format. The pipeline has been written in Nextflow to generate results for downstream analysis in a reproducible, scalable and portable way.

## Usage

Expand Down Expand Up @@ -54,6 +56,8 @@ nextflow run seqeralabs/nf-chai \

Set the `--weights_dir` parameter to a location with the pre-downloaded weights required by Chai-1 to avoid having to download them every time you run the pipeline.

To further improve prediction performance using pre-built multiple sequence alignments (MSA) with evolutionary information, set the `--msa_dir` parameter to a location with [`*.aligned.pqt`](https://github.com/chaidiscovery/chai-lab/tree/main/examples/msas#adding-msa-evolutionary-information) format as required by Chai-1.

## Credits

nf-chai was originally written by the Seqera Team.
Expand Down
File renamed without changes.
maxulysse marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
Binary file not shown.
48 changes: 43 additions & 5 deletions bin/run_chai_1.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from pathlib import Path
from chai_lab.chai1 import run_inference
import torch
import logging

def main():
# Set up argument parser
Expand All @@ -22,6 +23,37 @@ def main():
type=Path,
help="Path to the input FASTA file."
)
# Add optional arguments with current defaults
parser.add_argument(
"--num-trunk-recycles",
type=int,
default=3,
help="Number of trunk recycles (default: 3)"
)
parser.add_argument(
"--num-diffn-timesteps",
type=int,
default=200,
help="Number of diffusion timesteps (default: 200)"
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="Random seed for reproducibility (default: 42)"
)
parser.add_argument(
"--use-esm-embeddings",
action="store_true",
default=True,
help="Use ESM embeddings (enabled by default)"
)
parser.add_argument(
"--msa-dir",
type=str,
default=None,
help="Directory containing precomputed multiple sequence alignments (MSA)."
)

# Parse arguments
args = parser.parse_args()
Expand All @@ -34,17 +66,23 @@ def main():
args.output_dir.mkdir(parents=True, exist_ok=True)

# Set device for PyTorch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
logging.info("GPU found, using GPU")
device = torch.device("cuda")
else:
logging.info("No GPU found, using CPU")
device = "cpu"

# Run structure prediction
run_inference(
fasta_file=args.fasta_file,
output_dir=args.output_dir,
num_trunk_recycles=3,
num_diffn_timesteps=200,
seed=42,
num_trunk_recycles=args.num_trunk_recycles,
num_diffn_timesteps=args.num_diffn_timesteps,
seed=args.seed,
device=device,
use_esm_embeddings=True,
use_esm_embeddings=args.use_esm_embeddings,
msa_directory=Path(args.msa_dir) if args.msa_dir else None,
)

if __name__ == "__main__":
Expand Down
5 changes: 3 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ process {

params {

// Input fasta file
input = "${projectDir}/assets/short_protein_sequence.fa"
// Input sequence for FASTA file obtained from chai-lab examples:
// https://github.com/chaidiscovery/chai-lab/blob/2d2646bde676da6c9b3fa23b38b47fef8fc0d420/examples/msas/predict_with_msas.py#L14-L15
input = "${projectDir}/assets/fasta/short_protein_sequence.fa"

}
5 changes: 3 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@

params {

// Input fasta file
input = "${projectDir}/assets/multiple_entities.fa"
// Input sequences for FASTA file obtained from chai-lab examples:
// https://github.com/chaidiscovery/chai-lab/blob/2d2646bde676da6c9b3fa23b38b47fef8fc0d420/examples/predict_structure.py#L16-L23
input = "${projectDir}/assets/fasta/multiple_entities.fa"

}
23 changes: 23 additions & 0 deletions conf/test_full_msa.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running full-size tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a full size pipeline test.

Use as follows:
nextflow run seqeralabs/nf-chai -profile test_full_msa,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {

// Input sequences for FASTA file obtained from chai-lab examples:
// https://github.com/chaidiscovery/chai-lab/blob/2d2646bde676da6c9b3fa23b38b47fef8fc0d420/examples/predict_structure.py#L16-L23
input = "${projectDir}/assets/fasta/multiple_entities.fa"

// Input MSA files obtained from chai-lab examples:
// https://github.com/chaidiscovery/chai-lab/tree/2d2646bde676da6c9b3fa23b38b47fef8fc0d420/examples/msas
msa_dir = "${projectDir}/assets/msa/"

}
7 changes: 6 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,12 @@ workflow {
//
NF_CHAI (
params.input,
params.weights_dir
params.weights_dir,
params.msa_dir,
params.num_trunk_recycles,
params.num_diffusion_timesteps,
params.seed,
params.use_esm_embeddings
)

//
Expand Down
2 changes: 1 addition & 1 deletion modules/local/chai_1/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ dependencies:
- cuda=12.1
- pip
- pip:
- chai_lab==0.3.0
- chai_lab==0.4.2
18 changes: 14 additions & 4 deletions modules/local/chai_1/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@ process CHAI_1 {
tag "$meta.id"
label 'process_high'
conda "${moduleDir}/environment.yml"
container 'drpatelh/chai_lab:0.3.0'
container 'community.wave.seqera.io/library/gcc_linux-64_python_cuda_pip_chai_lab:44cb323409492b49'

input:
tuple val(meta), path(fasta)
path weights_dir
path msa_dir
val num_trunk_recycles
val num_diffusion_timesteps
val seed
val use_esm_embeddings

output:
tuple val(meta), path("${meta.id}/*.cif"), emit: structures
Expand All @@ -15,17 +20,22 @@ process CHAI_1 {

script:
def downloads_dir = weights_dir ?: './downloads'
def msa_path = msa_dir ? "--msa-dir=$msa_dir" : ''
def use_esm = use_esm_embeddings ? '--use-esm-embeddings' : ''
"""
CHAI_DOWNLOADS_DIR=$downloads_dir \\
run_chai_1.py \\
--fasta-file ${fasta} \\
--output-dir ${meta.id} \\
--fasta-file ${fasta}
--num-trunk-recycles ${num_trunk_recycles} \\
--num-diffn-timesteps ${num_diffusion_timesteps} \\
--seed ${seed} \\
${use_esm} \\
${msa_path}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version | sed 's/Python //g')
chai_lab: \$(python -c "import chai_lab; print(chai_lab.__version__)")
torch: \$(python -c "import torch; print(torch.__version__)")
END_VERSIONS
"""

Expand Down
15 changes: 11 additions & 4 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,12 @@ params {
// Input options
input = null
weights_dir = null
msa_dir = null
use_gpus = false
num_trunk_recycles = 3
num_diffusion_timesteps = 200
seed = 42
use_esm_embeddings = true

// Boilerplate options
outdir = null
Expand All @@ -26,6 +31,7 @@ params {

// Schema validation default options
validate_params = true

}

// Default publishing settings for all processes
Expand Down Expand Up @@ -127,6 +133,7 @@ profiles {
apptainer {
apptainer.enabled = true
apptainer.autoMounts = true
apptainer.runOptions = params.use_gpus ? '--nv' : ''
conda.enabled = false
docker.enabled = false
singularity.enabled = false
Expand All @@ -141,8 +148,9 @@ profiles {
wave.freeze = true
wave.strategy = 'conda,container'
}
test { includeConfig 'conf/test.config' }
test_full { includeConfig 'conf/test_full.config' }
test { includeConfig 'conf/test.config' }
test_full { includeConfig 'conf/test_full.config' }
test_full_msa { includeConfig 'conf/test_full_msa.config' }
}

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
Expand Down Expand Up @@ -183,8 +191,7 @@ manifest {
description = """Nextflow pipeline to run the Chai-1, SOTA model for biomolecular structure prediction"""
mainScript = 'main.nf'
nextflowVersion = '!>=24.04.2'
version = '1.0.0dev'
doi = ''
version = '0.2.0'
}

// Nextflow plugins
Expand Down
35 changes: 35 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,45 @@
"description": "Directory containing model weights and other artifacts required by Chai-1.",
"fa_icon": "fas fa-folder-open"
},
"msa_dir": {
"type": "string",
"format": "directory-path",
"exists": true,
"description": "Directory containing precomputed multiple-sequence alignments",
"fa_icon": "fas fa-align-justify"
},
"use_gpus": {
"type": "boolean",
"description": "Run compatible tasks on GPUs rather than CPUs (default).",
"fa_icon": "fas fa-microchip"
},
"num_trunk_recycles": {
"type": "integer",
"default": 3,
"fa_icon": "fas fa-recycle",
"description": "Number of trunk recycles",
"hidden": true
},
"num_diffusion_timesteps": {
"type": "integer",
"default": 200,
"fa_icon": "fas fa-shoe-prints",
"hidden": true,
"description": "Number of diffusion steps to use."
},
"seed": {
"type": "integer",
"default": 42,
"fa_icon": "fas fa-seedling",
"hidden": true,
"description": "Random seed to be used for Chai-1 calculations"
},
"use_esm_embeddings": {
"type": "boolean",
"default": true,
"fa_icon": "fas fa-stamp",
"hidden": true,
"description": "Use user-provided esm model embeddings"
}
}
},
Expand Down
16 changes: 13 additions & 3 deletions workflows/nf_chai/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,13 @@ include { CHAI_1 } from '../../modules/local/chai_1'
workflow NF_CHAI {

take:
fasta_file // string: path to fasta file read provided via --input parameter
weights_dir // string: path to model directory read provided via --weights_directory parameter
fasta_file // string: path to fasta file read provided via --input parameter
weights_dir // string: path to model directory read provided via --weights_dir parameter
msa_dir // string: path to the directory containing multiple sequence alignments (msa)
num_trunk_recycles // integer: Number of trunk recycles
num_diffusion_timesteps // integer: Number of diffusion steps to use
seed // integer: Random seed to be used for Chai-1 calculations
use_esm_embeddings // boolean: Use user-provided esm model embeddings

main:

Expand All @@ -34,7 +39,12 @@ workflow NF_CHAI {
// Run structure prediction with Chai-1
CHAI_1 (
ch_fasta,
weights_dir ? Channel.fromPath(weights_dir) : []
weights_dir ? Channel.fromPath(weights_dir) : [],
msa_dir ? Channel.fromPath(msa_dir) : [],
num_trunk_recycles,
num_diffusion_timesteps,
seed,
use_esm_embeddings
)
ch_versions = ch_versions.mix(CHAI_1.out.versions)

Expand Down
Loading