DeepDrug3D

DeepDrug3D is a tool to predict the protein pocket to be ATP/Heme/other-binding given the binding residue numbers and the protein structure.

If you find this tool useful, please star this repo and cite our paper :)

Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M (2019) DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLOS Computational Biology 15(2): e1006718. https://doi.org/10.1371/journal.pcbi.1006718

This README file is written by Limeng Pu.

An example of binding grid generated, pdb ID: 1a2sA, atom type: C.ar. Red --> low potentials while Blue --> high potentials.

Installation

Install in conda with:

conda env create -f environment.yaml

Prerequisites

Linux (DFIRE potential calculation supports only Linux)
Python 2.7 (if you are running Python 3, you need to change some syntax of some functions accordingly, like print() or map())
numpy 7.8.2 or higher
scipy 0.13.3 or higher
scikit-learn 0.19.0 or higher
Openbabel 2.3.1 or higher (if you are using Anaconda, can be installed with conda install -c openbabel openbabel)
tensorflow (GPU version if you wish to train on provided/your own data)
CUDA 7.5 or higher (if you wish to train on provided/your own data)
keras 2.1.4 or higher

For the installation instruction please refer to the corresponding project site.

Usage

The package provides both prediction and training modules.

The prediction module

It uses the pdb file and an auxilary input file, which contains biniding residue numbers and center of the ligand/pocket, as input files. The center in the auxilary input file is not necessary. If the center is not provided, the model will calculate the pocket center and use it as the ligand center. An example of the auxilary file is provided in example_aux.txt. The trained model is available at https://osf.io/enz69/ To use the prediction module, run python predict.py --protein your_protein.pdb --aux your_auxilary_file.txt --r 15 --N 31.

--protein contains the full path to the pdb file you wish to classify.
--aux is the auxilary file with binding residue numbers and center of ligand (optional).
--r and --N are the radius of the grid and number of points along the dimension of the grid. The default settings are r = 15 and N = 31. This setting yeilds a 32 x 32 x 32 grid. This can be changed by setting r and N.
Two files will be generated along the process, namely your_protein_trans.pdb and your_protein_trans.mol2 under the current working directory. These files are the transformed (moved to the provided center and aligned with the principal axes of the pocket) protein. They will be used during the later processes. If you do not wish to keep them, you can just delete them after the getting the results.
The output will be printed as three probabilities that each represents the likelihood of the pocket being an ATP/Heme/other binding pocket.
The entire process may take upto 30 minutes to finish since the grid point generation (mostly the potential calculation) is very time consuming.
The DFIRE potentials calculation uses the module provided by A Knowledge-Based Energy Function for Protein−Ligand, Protein−Protein, and Protein−DNA Complexes by Zhang et al. since it is written in Fortran, which is faster than our own implementation in Python.

The training module

In order to use our model to train your own dataset, you have to conert your dataset, which will be pdbs to voxel representation of protein-ligand biniding site. The trainig module can be runned as python train.py --alist deepdrug3d_atp.lst --hlist deepdrug3d_heme.lst --vfolder deepdrug3d_voxel_data --bs batch_size --lr inital_learning_rate --epoch number_of_epoches --output deepdrug3d.

--alist is the list of the full path to the ATP binding voxel data while --hlist is the list of the full path to the Heme binidng voxel data.
--vfolder is the folder contains all the voxel data, which contains numpy array (.npy) for each protein-ligand pair.
--bs, --lr, --epoch is the hyperparameters related to the model. Recommanded values are 64, 0.00001, 30.
If no output location is provided, the model will be saved to the current workding direcotry as 'deepdrug3d.h5'.

Dataset

We provided our dataset we used for the training at https://osf.io/enz69/, which are the voxel representations and ATP-, Heme-list.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
deepdrug3d		deepdrug3d
image		image
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
binding_grid.py		binding_grid.py
dligand-linux		dligand-linux
dummy_mol2.mol2		dummy_mol2.mol2
environment.yaml		environment.yaml
example_aux.txt		example_aux.txt
fort.21_drug		fort.21_drug
generate_aux_file.py		generate_aux_file.py
predict.py		predict.py
train.py		train.py
voxelization.py		voxelization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepDrug3D

Installation

Prerequisites

Usage

Dataset

About

Releases

Packages

Languages

License

galaxycomputationalchemistry/DeepDrug3D

Folders and files

Latest commit

History

Repository files navigation

DeepDrug3D

Installation

Prerequisites

Usage

Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages