Skip to content
This repository has been archived by the owner on Jun 2, 2024. It is now read-only.

Latest commit

 

History

History
101 lines (85 loc) · 2.73 KB

IDEAS.md

File metadata and controls

101 lines (85 loc) · 2.73 KB

Machine Learning for Drug Discovery

0. Architecture

  • Modular pipelines
    • Data:

    • Data Enrichment:

    • Data preprocessing

      • SMILES: canonizing, cleaning, etc.
      • TODO: check other data sources/types
      • Imbalanced classes
    • Feature engineering (ideas from here)

      • ECFP
      • DFS
      • 3D features based on MOPAC
      • Quantum-mechanical descriptors
      • Tanimoto
      • Minmax
      • Various 2D, 3D and pharmacophore kernels
      • In-house toxicophore and scaffold features
      • Dimensionality reduction:
        • PCA
        • FastICA
        • Manifolds
    • Feature selection:

      • Variance Inflation Factor (VIF)
      • F2 (Scikit-learn)
      • Forest importance
    • ML setup:

    • ML algorithms:

      • Scikit-learn (RF, SVM, Elastic Nets, Gradient boosting, etc.)
      • Ensembles (XGBoost, LightGBM, stacking, etc.)
      • PyTorch (LSTM, 1D-CNN, GNN, GANs, etc.)
    • Evaluation:

1. Ideas

  • Active Learning for increasing labelled samples

  • Generative models:

    • Examples:
    • GANs
    • RL:
      • Actions = add fragment, state = SMILES, reward = activity/similarity
      • Actions = add letter, state = SMILES, reward = activity/similarity
  • Generative models + labelling with Active Learning

  • Supervised Learning:

    • LSTM with SMILES
    • GNN with graph representations (maybe SDFs?)
  • Assessment/evaluation:

    • Similarity (assumption: it equals to activity)
    • Direct mapping to activity (using Supervised Learning)
  • DeepTox:

    • Machine Learning methods:
      • SVMs with various kernels
      • Random Forests
      • Elastic Nets
    • Features and kernels:
      • ECFP
      • DFS
      • 3D features based on MOPAC
      • Quantum-mechanical descriptors
      • Tanimoto
      • Minmax
      • Various 2D, 3D and pharmacophore kernels
      • In-house toxicophore and scaffold features