Baseline Implementation (VariPred) Model #5

merdivane · 2023-05-29T01:17:37Z

No description provided.

ofivite · 2023-05-29T10:01:14Z

VariPred is one specific solution of fine-tuning, which for a given protein sequence:

takes embedding vectors from pretrained PLMs (e.g. ESM) for wildtype and mutated positions in the sequence
concatenates them
trains simple feedforward network to predict the pathogenicity given the concat vector

I would say, a simpler baseline would be to not train it but rather use some distance metric between wildtype and mutation embedding vectors to see how it correlates with the target (pathogenicity). Similarly as studied in Nucleotide Transformer paper (Fig. 4).

The best performing out of those distance metrics would be our own baseline and the starting point of setting up the pipeline. Then, we can try fine-tuning as VariPred or possibly other strategies to improve upon it.

AllenChienXXX · 2023-05-29T22:08:52Z

VariPred is one specific solution of fine-tuning, which for a given protein sequence:

takes embedding vectors from pretrained PLMs (e.g. ESM) for wildtype and mutated positions in the sequence

concatenates them

trains simple feedforward network to predict the pathogenicity given the concat vector

I would say, a simpler baseline would be to not train it but rather use some distance metric between wildtype and mutation embedding vectors to see how it correlates with the target (pathogenicity). Similarly as studied in Nucleotide Transformer paper (Fig. 4).

The best performing out of those distance metrics would be our own baseline and the starting point of setting up the pipeline. Then, we can try fine-tuning as VariPred or possibly other strategies to improve upon it.

Do you know the source of this model?

merdivane · 2023-05-30T03:01:17Z

Nucleotide Transformer
Model weights available here: https://huggingface.co/InstaDeepAI
This model is now available to use with the transformers library! To use, please install from main, i.e. pip install --upgrade git+https://github.com/huggingface/transformers.git
Check out their paper for inspiration: https://www.biorxiv.org/content/10.1101/2023.01.11.523679v1

ofivite · 2023-05-30T07:11:49Z

I am actually not sure, would their model be useful for us? Because it's for DNA sequences but we do proteins, right?

merdivane mentioned this issue May 29, 2023

Baseline implementation (VariPred) #3

Open

4 tasks

ofivite self-assigned this May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Baseline Implementation (VariPred) Model #5

Baseline Implementation (VariPred) Model #5

merdivane commented May 29, 2023

ofivite commented May 29, 2023 •

edited

Loading

AllenChienXXX commented May 29, 2023

merdivane commented May 30, 2023

ofivite commented May 30, 2023

Baseline Implementation (VariPred) Model #5

Baseline Implementation (VariPred) Model #5

Comments

merdivane commented May 29, 2023

ofivite commented May 29, 2023 • edited Loading

AllenChienXXX commented May 29, 2023

merdivane commented May 30, 2023

ofivite commented May 30, 2023

ofivite commented May 29, 2023 •

edited

Loading