Our codebase is developed on top of FrameFlow, MultiFlow, PepFlow, and ByProt.
# Create and activate a conda environment with dependencies
conda env create -f env.yml
conda activate dflow
# Install the local package
pip install -e .
# Additional dependencies
pip install easydict lmdb
Manually install torch-scatter
based on your PyTorch version. For PyTorch 2.4.1
and CUDA 12.1
(H100 GPUs), run:
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.4.1+cu121.html
For other versions, find your PyTorch version with:
# Find your installed version of torch
python
>>> import torch
>>> torch.__version__
# Example: torch 2.4.1+cu121
Warning
If you encounter the error below from DeepSpeed:
ModuleNotFoundError: No module named 'torch._six'
Replace from torch._six import inf
with from torch import inf
in these files:
/path/to/envs/site-packages/deepspeed/runtime/utils.py
/path/to/envs/site-packages/deepspeed/runtime/zero/stage_1_and_2.py
where/path/to/envs
is your path.
Pretrain datasets are host on Zenodo here. Download the following files,
real_train_set.tar.gz
(2.5 GB)synthetic_train_set.tar.gz
(220 MB)test_set.tar.gz
(347 MB)
Next, untar the files
# Uncompress training data
mkdir train_set
tar -xzvf real_train_set.tar.gz -C train_set/
tar -xzvf synthetic_train_set.tar.gz -C train_set/
# Uncompress test data
mkdir test_set
tar -xzvf test_set.tar.gz -C test_set/
The resulting directory structure should look like
<current_dir>
├── train_set
│ ├── processed_pdb
| | ├── <subdir>
| | | └── <protein_id>.pkl
│ ├── processed_synthetic
| | └── <protein_id>.pkl
├── test_set
| └── processed
| | ├── <subdir>
| | | └── <protein_id>.pkl
...
Our experiments read the data by using relative paths. Keep the directory structure like this to avoid bugs.
PepMerge dataset is available on Google Drive here. Downloading the following files:
PepMerge_release.zip
(1.2GB)
The PepMerge_release.zip
contains filtered data of peptide-receptor pairs, which is collected from
PepBDB and QBioLip.
For example, in the folder 1a0n_A
, the P
chain in the PDB file 1a0n
is the peptide.
In each sub-folder, FASTA and PDB files of the peptide and receptor are given.
The postfix _merge means the peptide and receptor are in the same PDB file.
The binding pocket of the receptor is also provided, where our model is trained to generate peptides based on the binding pocket.
When you run the code, it will automatically process the data and produce pep_pocket_train_structure_cache.lmdb
and
pep_pocket_test_structure_cache.lmdb
in the default cache folder (i.e., ../pep_cache/
).
The command to run co-design training is the following,
# pretrain
python -W ignore dflow/experiments/train_se3_flows.py -cn pdb_codesign
# peptide training (1 GPU)
python -W ignore dflow/experiments/train_pep_flows.py
# DDP peptide training (e.g., 4 GPUs)
torchrun --nproc_per_node=4 dflow/experiments/train_pep_flows.py
We use Hydra to maintain our configs.
The training config is found here multiflow/configs/pdb_codesign.yaml
.
Most important fields:
experiment.num_devices
: Number of GPUs to use for training. Default is 2.data.sampler.max_batch_size
: Maximum batch size. We use dynamic batch sizes depending ondata.sampler.max_num_res_squared
. Both these parameters need to be tuned for your GPU memory. Our default settings are set for a 40GB Nvidia RTX card.data.sampler.max_num_res_squared
: See above.
Unpack the provided model weights:
unzip dlow.zip
The following generation task can be performed.
# Unconditional Co-Design
python -W ignore dflow/experiments/inference_pep.py
If you want to generate d-peptides, please set x_mirror
to True
in dflow/configs/pep_codesign.yaml
.
If you find this paper and the corresponding code interesting and helpful, we would really appreciate it if you can cite the paper. Thank you! 😜 Moreover, we welcome any sort of relevant questions. If you have any questions, please contact [[email protected]]. Thank you! :)
@article{wu2024d,
title={D-Flow: Multi-modality Flow Matching for D-peptide Design},
author={Wu, Fang and Xu, Tinson and Jin, Shuting and Tang, Xiangru and Xu, Zerui and Zou, James and Hie, Brian},
journal={arXiv preprint arXiv:2411.10618},
year={2024}
}