Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Analyze sequences" #134

Open
anna-panchenko opened this issue Sep 21, 2015 · 5 comments
Open

"Analyze sequences" #134

anna-panchenko opened this issue Sep 21, 2015 · 5 comments

Comments

@anna-panchenko
Copy link
Collaborator

Please remove this huge defline in an example sequence.
what does it mean: "Max 1 Sequence in FASTA format."?
I see "Sequence:" and "File:" on the left edge on the page - what do these mean?

@leonardomarino
Copy link
Collaborator

Would it be possible to have a clear button so the sequence box can be cleared. Additionally the results could display a result for the highest ranking result. I am attaching the results for Homo sapiens CenH3 as you can see the results state that the variant is unknown but looking at the scores it is clear that is an H3 and the highest score is with CenH3. Would it be possible to have this page give an educated guess?
histone_variants.txt
screen shot 2015-09-29 at 10 57 34 am

@molsim
Copy link
Collaborator

molsim commented Sep 29, 2015

Sure, we will make a "clear" button and I think we will rework the design of this page anyway.
As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3.

The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic).
There are canonical H3 that score more with cenH3 model than some true cenH3. So we keep the classification threshold high to avoid wrong guesses.

@molsim
Copy link
Collaborator

molsim commented Sep 29, 2015

Ok, I had a close look at the CenH3, here is how we can improve the classifier - just report both:

  1. If the sequence satisfied our robust criterion
  2. the model with maximum HMM score

@leonardomarino
Copy link
Collaborator

In this case how about saying H3, unknown variant. But at least we should be able to classify at the histone level.

Leonardo Mariño-Ramírez
[email protected]

---- Alexey Shaytan wrote ----

Sure, we will make a "clear" button and I think we will rework the design of this page anyway.
As for the results page, I think we will add explanations, that the upper green box is the result of HMM classifier, and ask the user to focus attention on the BLAST table below in case HMM classifier fails - which makes it clear that the variant is CenH3.
The problem with HMM classifier for CenH3 is that CenH3 are rather diverse (not monophyletic), so the decision thresholds that give acceptable TP/FP ration are high. In the case of homo CenH3 simply taking the highest score works, but it might not wok in other cases. So we prefer to report unknown, rather than report a wrong variant.


Reply to this email directly or view it on GitHub.

@molsim
Copy link
Collaborator

molsim commented Sep 29, 2015

I will see what can we do. I'd better refer the user to our blast results table, it seems more straightfroward to me.

To classify the variant as H3 using HMMs, we will need a combined model for all H3. But since cenH3 is sufficiently divergent from other H3s, the model might again have problems in picking cenH3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants