This is a graph-based deep learning method for predicting pseudogene functions by borrowing information from coding genes. We use both network information and node attributes to improve the performance. Sequence similarity networks are used to construct graphs connecting pseudogenes and coding genes, which are used to propagate node attribtues, so that pseudogenes can borrow information from well-studied coding genes.
We use two types of expression profiles (from TCGA and GTEx database, respectively), interactions with microRNAs and PPI and genetic interactions as the node attributes (initial feature representation).
We have shown that our method achieved state-of-the-art performance, significantly outperforming existing methods. Our graph neural network model is implemented based on Pytorch Geometric package in Python 3.6.
If you find our work is useful for your research, please consider citing our work:
@ARTICLE{10.3389/fgene.2020.00807,
AUTHOR={Fan, Kunjie and Zhang, Yan},
TITLE={Pseudo2GO: A Graph-Based Deep Learning Method for Pseudogene Function Prediction by Borrowing Information From Coding Genes},
JOURNAL={Frontiers in Genetics},
VOLUME={11},
PAGES={807},
YEAR={2020},
URL={https://www.frontiersin.org/article/10.3389/fgene.2020.00807},
DOI={10.3389/fgene.2020.00807},
ISSN={1664-8021}
}
- Python 3.6
- Pytorch
- Pytorch Geometric
- networkx
- scipy
- numpy
- pickle
- scikit-learn
- pandas
You can download the raw data and processed data (ready for use in the model) from here data. Please Download the datasets and put them in the existing data folder.
unzip data.zip
unzip raw_data.zip
unzip final_input.zip
mv raw_data final_input data
cd preprocessing
python preprocess_final.py
cd model
python pseudo2go.py
Note there are several parameters can be tuned. Please refer to the pseudo2go.py file for detailed description of all parameters