This repository aims to provide a modular architecture to rapidly build pipelines that allow the user to discover or repurpose drugs.
All the dependencies are detailed in the environment.yml file. To install them, create a new conda environment using that file:
$ conda env create --name dd --file environment.yml
$ conda activate dd
import plotly.express as px
from sklearn.pipeline import Pipeline
from rdkit import Chem
from modules.data_loaders import DataLoaderManager
from modules.preprocessing.smiles import SMILESChecker
from modules.preprocessing.descriptors import (
DescriptorPipeline,
DescriptorMordred
)
# Load data
data_loader = DataLoaderManager()
data = data_loader.load(
path='tests/data/test_data.sdf',
removeHs=False
)
# Preprocess data (sanitize SMILES)
smiles_pipe = Pipeline(steps=[
('SMILESChecker', SMILESChecker())
])
data['SMILES'] = smiles_pipe.fit_transform(
X=data['SMILES'].to_numpy()
)
# Recalculate mol from curated SMILES
data['Molecule'] = [Chem.MolFromSmiles(smiles)
for smiles in data['SMILES']]
# Calculate descriptors
desc_pipe = DescriptorPipeline(mol_column='Molecule', steps=[
('Mordred', DescriptorMordred())
])
data = desc_pipe.fit_transform(X=data)
# Visualize descriptors
variables = {
'x': 'MW',
'y': 'nHetero',
'z': 'SLogP',
'color': 'SR-p53'
}
fig = px.scatter_3d(
data_frame=data,
x=variables['x'],
y=variables['y'],
z=variables['z'],
color=variables['color'],
template='plotly_white',
height=750,
width=900,
title='Initial EDA'
)
fig.show()
For detailed examples, please see the examples folder.
A detailed roadmap with future lines of work can be found here. Ideas and possible future implementations can also be found here.