Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biolink Model usage #222

Open
EvanDietzMorris opened this issue May 2, 2024 · 0 comments
Open

Biolink Model usage #222

EvanDietzMorris opened this issue May 2, 2024 · 0 comments

Comments

@EvanDietzMorris
Copy link
Contributor

ORION uses the biolink model to do a handful of different things. Most importantly, in the main source data pipeline, during normalization, the bl_lookup service is called to convert/normalize predicates for biolink compliance, and to validate node types.

But also:

  • When generating a meta knowledge graph and test data for a graph (in Common/meta-kg), the biolink model is used to find leaf node types, inverse predicates, and attribute type ids
  • Ubergraph uses the biolink model for URI prefix mapping
  • LitCoin uses the biolink model to determine leaf types of types coming from the name resolver (this is really not important and it's only for comparing and analyzing name resolver results, it doesn't affect the graph outputs)
  • In cli/generate_redundant_kg it uses the biolink model to create a graph with redundant edges.

Recently, I added an environment variable (BL_VERSION) which can be used to set the biolink model version, and consolidated the instantiation of the biolink model toolkit to one place where it is applied, but currently this only applies to the usages listed above EXCEPT the normalization of edges, which is the most important part. The version used to normalize edges in the main pipeline uses either a version specified in the graph spec, or the latest biolink version, according to the bl_lookup service. Additionally, in the graph spec, this is called "edge_normalization_version" instead of something about biolink.

Additionally, as far as ORION is concerned, the bl_lookup service functionality is completely redundant with the biolink model toolkit package now, and is only used for historical reasons. We should switch over to the python package instead for better performance and to remove the dependency on the external service.

So, in my opinion, we should:

  • completely replace the bl_lookup service calls and instead use the biolink model toolkit
  • rename "edge_normalization_version" to "biolink_version" or something more explicit
  • make the pipeline use whatever is set as the BL_VERSION environment variable as the default instead of the latest version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant