TSV - serializing data as node properties and/or edge-node #2

kshefchek · 2021-01-18T16:41:38Z

kshefchek
Jan 18, 2021

Our tsv serialization aims to be compatible with property graphs that support node and edge properties as scalars (string, int, float) and list properties (eg neo4j).

Background:
Given a triple in a named graph, we can model this in a property graph in two ways:

Triple:
named_graph: gene:A RO:has_phenotype HP:phenotype

As node properties:
(
id: "gene:A"
has_phenotype: "HP:phenotype"
defined_by: "named_graph"
)

as an edge:

(id: "gene:A") - [has_phenotype {defined_by: "named_graph"}] -> (id: "HP:phenotype")

The node property approach does not scale well for tracking provenance and metadata for attributes, or complex objects - for example:

In Mondo, synonyms are annotated with the upstream ontology that provided it, which would get lost as a node property
In the HPO, some synonyms and labels are tagged as being in the lay person subset
Publications are linked to a node, but we also want to retrieve each publications author list, title, etc

In many cases it useful to store data as both properties and as new nodes.

Previously this was partially hardcoded in scigraph and a configuration option

This also doesn't touch on edge properties linked to objects. However, theres no great way around this unless we want to store associations as nodes again (see the scigraph transform as an example). So I think it's best to enforce that edge properties have to be primitives.

cc @TomConlin @matentzn @deepakunni3

matentzn · 2021-01-19T16:41:18Z

matentzn
Jan 19, 2021
Maintainer

Very important question - I think I would like to hear @deepakunni3 (and by proxy Chris) first. My very initial thought is: named graphs is weird in neo4j and maybe we should first decide what exactly we mean by LPG - anything that can be queried in cypher? Anything that can be queried in neo (think graphlib).

In any case -> provenance on node properties is a huge issue -> not sure how much I want to conflate this with the question here. Maybe you can walk us through your thoughts in the next meeting!

0 replies

kshefchek · 2021-06-02T17:39:16Z

kshefchek
Jun 2, 2021
Author

I think this ends up being out of scope for koza, although we're still working out how exporting to kgx will work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSV - serializing data as node properties and/or edge-node #2

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

TSV - serializing data as node properties and/or edge-node #2

kshefchek Jan 18, 2021

Replies: 2 comments

matentzn Jan 19, 2021 Maintainer

kshefchek Jun 2, 2021 Author

kshefchek
Jan 18, 2021

matentzn
Jan 19, 2021
Maintainer

kshefchek
Jun 2, 2021
Author