This is the offcial repo for the ICAICE 2023 paper "A Protein Structure Enhanced Pre-training Model for Allosteric Site Detection".
For the latest official source code, please refer to https://github.com/Little-LL.
-
Follow the instruction
Data downloading
indata_processing.ipynb
to download the pre-training pdb corpus from rcsb.org. -
Execute the
Build the dataset for pretrain ResidueRobertaMLM
indata_processing.ipynb
to process the pdb file and build the pre-training data. -
Run the following to pre-train the Residue-RoBERTa:
python -u pretrain_ResidueRoberta.py
When an error occurs, the resume_pretraining.py
can be executed to continue the pre-training.
python -u pretrain_ResidueRoberta.py
Our pre-trained checkpoints can be obtained from
https://drive.google.com/drive/folders/1Q6cd4mTw7Imd9fdiz8qttbF_27_oGMPI?usp=drive_link
Run the following to train the model to directly predict allosteric sites in 3D protein sequences:
python -u train_with_TokenClassification.py
Run the following to train the model with Logit Adjustment:
Menon, Aditya Krishna, et al., Long-tail learning via logit adjustment, ICLR 2021
python -u train_with_TokenClassification_LA.py
Run the following to train the model to predict allosteric pockets with 3D protein sequences:
python -u train_with_SequenceClassification.py
We reproduce the main results of Allosteric site classification and Allosteric pocket classification in the following tables:
Metric | Site classification | Pocket classification |
---|---|---|
Residue | ||
residue acc | - | - |
residue precision | - | - |
residue recall | - | - |
residue f1 | - | - |
sequence acc | - | - |
pocket acc | - | - |
pocket precision | - | - |
pocket recall | - | - |
pocket f1 | - | - |
The processed data of allosteric sites we use is uploaded to GitHub (data/allosteric_site/
).
And the origin pdb data is from Allosteric Database as shown in data/ASD_Release_201909_AS.txt
Liu, Xinyi, et al., ASD: a comprehensive database of allosteric proteins and modulators, Nucleic Acids Research
If you find this work useful, please cite our paper:
@inproceedings{}