A multi-granularity information-enhanced pre-training method for predicting the coding potential of sORFs in plant lncRNAs
The codes and data here are used to predict the coding potential of lncRNA-sORFs. It will give researchers useful guidelines to discover peptides.
contains pretraining samples.
is a model file. You need to download "LSCPP_BERT.bin" from (https://drive.google.com/file/d/1o7KZwG5fbGZd3K1LMYiD6qCOyOHEXU4m/view?usp=sharing) or (https://pan.baidu.com/s/18P3w7MQUBI49IEjCyf6C8Q?pwd=18p1). Then, you should move the file "LSCPP_BERT.bin" to the "model" folder
You can run this file to test.
In line 88, you can change the path of the test file for testing your own data.
In line 92, this is the path of model file.
Based on python 3.7.12
Python modules:
numpy (1.21.6)
torch (1.7.1)
multiprocessing
pandas (1.3.5)
os
random
math
will be used.