Biolink Model usage #222

EvanDietzMorris · 2024-05-02T19:26:16Z

ORION uses the biolink model to do a handful of different things. Most importantly, in the main source data pipeline, during normalization, the bl_lookup service is called to convert/normalize predicates for biolink compliance, and to validate node types.

But also:

When generating a meta knowledge graph and test data for a graph (in Common/meta-kg), the biolink model is used to find leaf node types, inverse predicates, and attribute type ids
Ubergraph uses the biolink model for URI prefix mapping
LitCoin uses the biolink model to determine leaf types of types coming from the name resolver (this is really not important and it's only for comparing and analyzing name resolver results, it doesn't affect the graph outputs)
In cli/generate_redundant_kg it uses the biolink model to create a graph with redundant edges.

Recently, I added an environment variable (BL_VERSION) which can be used to set the biolink model version, and consolidated the instantiation of the biolink model toolkit to one place where it is applied, but currently this only applies to the usages listed above EXCEPT the normalization of edges, which is the most important part. The version used to normalize edges in the main pipeline uses either a version specified in the graph spec, or the latest biolink version, according to the bl_lookup service. Additionally, in the graph spec, this is called "edge_normalization_version" instead of something about biolink.

Additionally, as far as ORION is concerned, the bl_lookup service functionality is completely redundant with the biolink model toolkit package now, and is only used for historical reasons. We should switch over to the python package instead for better performance and to remove the dependency on the external service.

So, in my opinion, we should:

completely replace the bl_lookup service calls and instead use the biolink model toolkit
rename "edge_normalization_version" to "biolink_version" or something more explicit
make the pipeline use whatever is set as the BL_VERSION environment variable as the default instead of the latest version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Biolink Model usage #222

Biolink Model usage #222

EvanDietzMorris commented May 2, 2024

Biolink Model usage #222

Biolink Model usage #222

Comments

EvanDietzMorris commented May 2, 2024