Skip to content
This repository has been archived by the owner on Jun 2, 2024. It is now read-only.

guillecg/drug-discovery

Repository files navigation

Python package

Machine Learning for Drug Discovery

This repository aims to provide a modular architecture to rapidly build pipelines that allow the user to discover or repurpose drugs.

Table of contents

Setup

All the dependencies are detailed in the environment.yml file. To install them, create a new conda environment using that file:

$ conda env create --name dd --file environment.yml
$ conda activate dd

Getting started

import plotly.express as px

from sklearn.pipeline import Pipeline

from rdkit import Chem

from modules.data_loaders import DataLoaderManager
from modules.preprocessing.smiles import SMILESChecker
from modules.preprocessing.descriptors import (
    DescriptorPipeline,
    DescriptorMordred
)

# Load data
data_loader = DataLoaderManager()
data = data_loader.load(
    path='tests/data/test_data.sdf',
    removeHs=False
)

# Preprocess data (sanitize SMILES)
smiles_pipe = Pipeline(steps=[
    ('SMILESChecker', SMILESChecker())
])
data['SMILES'] = smiles_pipe.fit_transform(
    X=data['SMILES'].to_numpy()
)

# Recalculate mol from curated SMILES
data['Molecule'] = [Chem.MolFromSmiles(smiles)
                    for smiles in data['SMILES']]

# Calculate descriptors
desc_pipe = DescriptorPipeline(mol_column='Molecule', steps=[
    ('Mordred', DescriptorMordred())
])
data = desc_pipe.fit_transform(X=data)

# Visualize descriptors
variables = {
    'x': 'MW',
    'y': 'nHetero',
    'z': 'SLogP',
    'color': 'SR-p53'
}
fig = px.scatter_3d(
    data_frame=data,
    x=variables['x'],
    y=variables['y'],
    z=variables['z'],
    color=variables['color'],
    template='plotly_white',
    height=750,
    width=900,
    title='Initial EDA'
)
fig.show()

For detailed examples, please see the examples folder.

Roadmap

A detailed roadmap with future lines of work can be found here. Ideas and possible future implementations can also be found here.

About

Machine Learning for Drug Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages