-
Notifications
You must be signed in to change notification settings - Fork 8
SaprotHub v1 (will be deprecated in future)
Different models are designed for different tasks, so it's essential to understand which type your task belongs to.
📍To view the full list of tasks supported by ColabSaprot, please refer to task_list.md.
Here are the task types and their description, so you can recognize your task type based on your task description and objectives.
For Classification and Regression prediction task:
- Protein-level Classification Task
- Protein-level Regression Task
- Residue-level Classification Task
- Protein-protein Classification Task
- Protein-protein Regression Task
For Zero-shot prediciton task:
- Mutational effect prediction
- Inverse folding prediction
Train a model based on SaProt and use it to make prediction.
Task Type | Task Description | Example |
---|---|---|
Protein-level Classification | Classify protein sequences. | - Fold Class Prediction - Localization Prediction - Function Prediction |
Protein-level Regression | Predict the value of some property of a protein sequence. | - Thermal Stability Prediction - Fluorescence Intensity Prediction - Binding Affinity Prediction |
Residue-level Classification | Classify the amino acids in a protein sequence. | - Secondary Structure Prediction - Binding Site Prediction - Active Site Prediction |
Protein-protein Classification | Predict if there is interaction between the two proteins. | - Protein-Protein Interaction (PPI) Prediction - Interaction Type Classification Disease - Associated Interaction Prediction |
Protein-protein Regression | Predict the ability of interaction between the two proteins. | - Interaction Strength Prediction - Binding Free Energy Calculation - Interaction Affinity Prediction |
Directly use SaProt (650M) to make prediction.
Task Type | Task Description | Example |
---|---|---|
Mutational Effect Prediction | Predict the mutational effect based on the wild type sequence and mutation information. | - Enzyme Activity Prediction - Virus Fitness Prediction - Driver Mutation Prediction |
Inverse Folding Prediction | Predict the residue sequence given the structure backbone. | - Enzyme Function Optimization - Protein Stability Enhancement - Protein Folding Prediction |
You can use your private data to train and predict. Below are the various data formats corresponding to different data types.
We combine the residue and structure tokens at each residue site to create a Structure-aware sequence (SA sequence), merging both residue and structural information.
The structure tokens are generated by encoding the 3D structure of proteins using Foldseek.
Here you can convert your data into SA Sequence format.
- Single AA Sequence
- Single SA Sequence
- Single UniProt ID
- Single PDB/CIF Structure
- Multiple AA Sequences
- Multiple SA Sequences
- Multiple UniProt IDs
- Multiple PDB/CIF Structures
- SaprotHub Dataset
For tasks that require two protein sequences as input (pair classification & pair regression) :
- A pair of AA Sequences
- A pair of SA Sequences
- A pair of UniProt IDs
- A pair of PDB/CIF Structures
- Multiple pairs of AA Sequences
- Multiple pairs of SA Sequences
- Multiple pairs of UniProt IDs
- Multiple pairs of PDB/CIF Structures
- Go to Official SaProtHub Repository to find some datasets.
- Copy the
Dataset ID
for future use.
Link | |
---|---|
Get Structure-Aware Sequence | here |
Convert .fa file to .csv dataset (data type:Multiple AA sequences ) |
here |
Randomly split your dataset | here |