Skip to content

A transformer-based method for detecting allosteric sites and pockets.

Notifications You must be signed in to change notification settings

nicholas9698/Allosteric-site

Repository files navigation

Allosteric-site

This is the offcial repo for the ICAICE 2023 paper "A Protein Structure Enhanced Pre-training Model for Allosteric Site Detection".

For the latest official source code, please refer to https://github.com/Little-LL.

Pre-training

  1. Follow the instruction Data downloading in data_processing.ipynb to download the pre-training pdb corpus from rcsb.org.

  2. Execute the Build the dataset for pretrain ResidueRobertaMLM in data_processing.ipynb to process the pdb file and build the pre-training data.

  3. Run the following to pre-train the Residue-RoBERTa:

python -u pretrain_ResidueRoberta.py

When an error occurs, the resume_pretraining.py can be executed to continue the pre-training.

python -u pretrain_ResidueRoberta.py

Our pre-trained checkpoints can be obtained from https://drive.google.com/drive/folders/1Q6cd4mTw7Imd9fdiz8qttbF_27_oGMPI?usp=drive_link

Train and Test

Allosteric site prediction

Run the following to train the model to directly predict allosteric sites in 3D protein sequences:

python -u train_with_TokenClassification.py

Run the following to train the model with Logit Adjustment:

Menon, Aditya Krishna, et al., Long-tail learning via logit adjustment, ICLR 2021

python -u train_with_TokenClassification_LA.py

Allosteric pocke prediction

Run the following to train the model to predict allosteric pockets with 3D protein sequences:

python -u train_with_SequenceClassification.py

Test Results

We reproduce the main results of Allosteric site classification and Allosteric pocket classification in the following tables:

Metric Site classification Pocket classification
Residue
residue acc - -
residue precision - -
residue recall - -
residue f1 - -
sequence acc - -
Pocket
pocket acc - -
pocket precision - -
pocket recall - -
pocket f1 - -

Dataset

The processed data of allosteric sites we use is uploaded to GitHub (data/allosteric_site/).

And the origin pdb data is from Allosteric Database as shown in data/ASD_Release_201909_AS.txt

Liu, Xinyi, et al., ASD: a comprehensive database of allosteric proteins and modulators, Nucleic Acids Research

Citation

If you find this work useful, please cite our paper:

@inproceedings{}

About

A transformer-based method for detecting allosteric sites and pockets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published