Skip to content

Latest commit

 

History

History
94 lines (78 loc) · 6.08 KB

README.md

File metadata and controls

94 lines (78 loc) · 6.08 KB

ecPATH: Predicting ecDNA status in Tumors from Histopathology Slide Images

Table of Content

Performing ecDNA Predictions

Please note:

  • This pipeline is tested on 'Ubuntu 22.04.3 LTS' with GPU available (NVIDIA GeForce RTX 3090)
  • Conda is needed for environment management. Current pipeline is developed & tested under conda 24.5.0.
  • Our pipeline uses resnet50 model as the default feature extraction framework. To use UNI model, you need to obtain approval (an access token) from Hugging Face and copy the access token in ./Prediction/param.py
  • Complementary data will be automatically downloaded from Zenodo server, in case of malfunctioning, please manually download ecPATH model weights from this Zenodo record, decompress it and put entire Data folder on the top level of this directory.

Usage & Demo:

0. Install conda (if needed): install conda

1. Clone this repo to you local

git clone https://github.com/Sinha-CompBio-Lab/ecPATH.git

2. Create the desired conda environment

conda env create -f environment.yml

3. Customize ./Prediction/param.py to fit your analysis.(very important)

# key parameters are critical. Minimum: provide cancer_type, slide_extention, input_keyword, pretrained_model_name.

4. Prepare input slides: place slides in ./Prediction/input/ (Currently, only .svs image files are tested & supported)

# make sure to include a keyword in a set of input slides, e.g., test_1.svs, test_2.svs, ..., test_n.svs.

5. Execute prediction script

python3 ./Prediction/predict.py

6. Output:

  • for each input slide:
    • a collection of tile features at ./Prediction/input/SlideKeyword_1/_features/features_{model_name}.npy
    • a visualization of tile selection at ./Prediction/input/SlideKeyword_2/_masks/mask.pdf
    • a tile coordinates list at ./Prediction/input/SlideKeyword_3/_coordinates/tile_coordinates.csv
  • for each analysis:
    • intermediate gene expresion predictions (for each input slide) at ./Prediction/output/{cancer_type}gene_expression_predictions{model_name}.csv
    • final ecDNA prediction (for each input slide) at ./Prediction/output/{cancer_type}ecDNA_predictions{model_name}.csv

Reproducing Figures

A set of notebooks can be found at ./Figure_Reproduce/, containing the codes we used to generate the figures in this manuscript.

To reproduce the figures in our manuscript, you need:

  • Download the tabular data set from this Zenodo record.
  • Set up R 4.3.2 environment, with the following packges:
    • data.table
    • dplyr
    • forcats
    • fmsb
    • ggalt
    • ggplot2
    • ggpubr
    • ggsignif
    • pROC
    • readr
    • reshape2
    • stringr
    • survminer
    • tidyr

Model Training Specifications

./Model_Training/ contains essential building blocks for model training, including data preprocessing, model architecture, and training logic. We primarily utilized the Slurm Workload Manager to leverage computing resources at Sanford Burnham Prebys Medical Discovery Institute. Please note that in this release, these model training scripts are provided as reference implementations rather than for direct execution. You may need to refactor and adapt them to suit your specific computing environment and requirements.