TooT-BERT-SC is a BERT based classification model, predicting eleven substrate classes of transmembrane transport proteins. The list of classes is as follows.
- Nonselective
- Water
- Inorganic cations
- Inorganic anions
- Organic anions
- Organo-oxygens
- Amino acids and derivatives
- Other organonitrogens
- Nucleotides
- Organic heterocyclics
- Miscellaneous
This model is based on Prot-BERT-BFD model fine tuned on Substrate Class (SC) dataset. The BERT model is followed by a linear layer for classfication using softmax function.
#Usage:
The program could be run using the following command:
python run.py [input_fasta_file] [output_file]
For example:
python run.py Datasets/test.fasta out.txt
The file "test.fasta" is the input file containing protein sequences in fasta format and "out.txt" contains the id of the test sequence followed by the prediction.