Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-shot pathogenicity prediction (Baseline) #19

Open
merdivane opened this issue May 29, 2023 · 0 comments
Open

Zero-shot pathogenicity prediction (Baseline) #19

merdivane opened this issue May 29, 2023 · 0 comments

Comments

@merdivane
Copy link
Contributor

Historically first: https://www.biorxiv.org/content/10.1101/2021.07.09.450648v2.full.pdf
More extensive study: https://www.biorxiv.org/content/10.1101/2022.09.30.510294v3.full.pdf (new dataset: COSMIC + TCGA, new task: survival prediction)
Another extensive study: https://arxiv.org/pdf/2211.10000.pdf (new task: rescue mutations impact)
Extension of the baseline to any protein length: https://www.biorxiv.org/content/10.1101/2022.08.25.505311v1.full

Image

The idea is to take a protein language model (PLM) and pre-train it on a large corpus of available protein sequences in a BERT fashion (mask random tokens, task is to predict them). Then, logits predicted by the model for a given position under wildtype
(wt) and mutated (mt) token are shown to be effective predictor of the pathogenicity:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant