Zero-shot pathogenicity prediction (Baseline) #19

merdivane · 2023-05-29T11:20:54Z

Historically first: https://www.biorxiv.org/content/10.1101/2021.07.09.450648v2.full.pdf
More extensive study: https://www.biorxiv.org/content/10.1101/2022.09.30.510294v3.full.pdf (new dataset: COSMIC + TCGA, new task: survival prediction)
Another extensive study: https://arxiv.org/pdf/2211.10000.pdf (new task: rescue mutations impact)
Extension of the baseline to any protein length: https://www.biorxiv.org/content/10.1101/2022.08.25.505311v1.full

The idea is to take a protein language model (PLM) and pre-train it on a large corpus of available protein sequences in a BERT fashion (mask random tokens, task is to predict them). Then, logits predicted by the model for a given position under wildtype
(wt) and mutated (mt) token are shown to be effective predictor of the pathogenicity:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-shot pathogenicity prediction (Baseline) #19

Zero-shot pathogenicity prediction (Baseline) #19

merdivane commented May 29, 2023

Zero-shot pathogenicity prediction (Baseline) #19

Zero-shot pathogenicity prediction (Baseline) #19

Comments

merdivane commented May 29, 2023