Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing behavior when a sub-object pair is linked by multiple relations #35

Open
kshefchek opened this issue Nov 20, 2017 · 6 comments

Comments

@kshefchek
Copy link
Contributor

kshefchek commented Nov 20, 2017

Consider the following pattern:

(subject:gene)<-[has_locus]-(variant)-[relation]->(object:disease)

Where relation is one of:

  1. pathogenic
  2. likely pathogenic
  3. has phenotype
  4. marker/mechanism
  5. contributes to
    ...

In many cases, multiple variants of a single gene are linked to a disease via multiple relations (commonly pathogenic and likely pathogenic). Currently, the solr loader seems to pick a relation at random (although this may not be the case and it may in fact be deterministic for a given db).

This is also an issue with combining orthology statements from multiple sources (panther and zfin) where panther specifies whether two orthologs have a 1 to 1 relationship whereas zfin does not.

One option is to store the set of relations linking two nodes. Another option would be to configure a relation priority, where the relation with the highest priority is designated while the others are retrievable via the evidence graph.

@mbrush @selewis @cmungall thoughts?

@cmungall
Copy link
Contributor

Why not just make different associations? Doesn't each have it's own evidence/provenance etc?

@kshefchek
Copy link
Contributor Author

@cmungall could you clarify your suggestion? One document per association could lead to a lot of additional documents since we infer across variants; some genes have a lot of causal variants for a disease (eg BRCA). One document per relation is possible, but IMO we'll still be showing too much duplication to the user (or operating on it in ontobio).

As a potential workaround for G2D, I have split up causal vs non causal associations. This way they can be displayed separately to our end users. The downside is that there will be some redundancy between the two gene-disease lists, as CTD and Coriell will often report he causal gene in additional to those with more hypothetical evidence.

causual g2d

hypothetical g2d - gwas, ctd, coriell

@cmungall
Copy link
Contributor

I think your solution is on the right lines. I think having a smaller set of relationship types where we separate evidence from relation ("likely pathogenic" should not be a relation) should in theory mean high quality resources should not generally conflict

@kshefchek
Copy link
Contributor Author

The relation that maps to ACMG likely_pathogenic is all in yaml file(s), so it's an easy change when we're ready.

Thinking about this from the UI perspective, should we have one list of causal genes, and one list of all genes so that the latter list fully subsumes the list of causal genes (instead of partially overlapping sets)?

@cmungall
Copy link
Contributor

cmungall commented Feb 19, 2019 via email

@monicacecilia
Copy link

monicacecilia commented Jul 2, 2019

Adding a little reminder that Chris' suggestion is still not implemented. Instead, we have a list of all genes, and the causal gene in this, our favorite example, shows up 6th on the list.

Screen Shot 2019-07-01 at 5 43 17 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants