-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edge Merging Proposal #70
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under "attributes" would it not make more sense to link to the unique provenance ID which produced the attribute (e.g. "p4"), rather than the infores ID of the source for "attribute_source"? It's a subtle distinction, but it would disambiguate at which point in the provenance path an attribute was added and would track how an attribute was added in cases where an ARA might be doing more complex operations.
Thanks for writing this proposal, @uhbrar. Here are some concerns and comments I originally posted in Slack. When a service does decide to merge, I agree that it is important to have a standard set of edge components that determine which edges are eligible for merging. For the sake of ARA flexibility and Translator robustness, the decision of whether to merge should be left to a service's discretion, and not be required. Downstream services should be able to cope with unmerged results. I can imagine ARAs that choose to perform normalization differently, as part of their approach to reasoning. They may choose to perform fewer or more forms of normalization. Such deviation won't be compatible with downstream services expecting globally-mandated merging guarantees, though it should still be compatible with global agreement on edge key components. Communicating hash values directly as part of responses seems fragile, and does not seem necessary for efficient lookup, despite the claim in the proposal. Fragility concerns:
The proposal's efficiency claim is that hash values will prevent a service from having to look through all the results in order to bucket them appropriately. But this can't be true for a couple of reasons:
Without transmitting hash values, each service can still locally hash the components that make up an edge's "key", with work proportional to the size of each key, in order to quickly identify edge buckets. No global agreement on hash function is necessary for this. The cost of this local hashing should not dominate any of the other already-required work as described above (parsing JSON and iterating over edges). |
I've provided my comments in the form of a document:
|
In my current understanding, qualifiers will be a new level of semantic meaning more complex than subject-object-predicate, so when considering to merge edges, it will be crucial to consider qualifiers as being just as important as subject-object-predicate. i.e. two assertions with the same subject-object-predicate but different qualifiers MUST be treated as different assertions. |
@edeutsch - Yes, qualifiers are required for equivalence. In this proposal, they are one of the 5 values that must be the equal to merge any equivalent edges. The proposal dictates that you must have the same subject, predicate, object, qualifiers, and original/primary knowledge source. @ehinderer - I actually like this idea a bit, but at the same time, that unique provenance id may actually be the infores id of the service anyway. I'm actually leaning towards that, since I don't see why a kp would report different methods of attaining an edge to two different ARA's. @jeffhhk There's a few points I'd like to address:
|
OR | ||
* "biolink: primary_knowledge_source" | ||
|
||
All edges must have one (and only one) of these attributes listed, and to merge two edges, they must share the same value for the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "must have one (and only one) of these attributes listed" . . . there will be 0, 1, or many qualifiers on a given Edge . . . so the one and only one rule cannot apply here.
There is no need to polute TRAPI schema since the retrieval chain can easily and cleanly be expressed using attributes: KP1
KP2
KP3
ARA1
ARA2
WR
We can add
|
Thanks Vlado - this is exactly how I envision an attribute-based representation using the current source retrieval properties looking. Level 1 attributes can provide a list of all Resources through which the knowledge expressed in the Edge passed at some point - and indicate which was primary/original. If more info about the order/graph of retrieval is needed, a second level could be introduced using nested attributes. These additional levels could hold more details about when, by whom, how retrievals were performed, if this proves useful. Like you, I would like to better understand why the current, simple flat list of Attributes holding Resources and their roles as primary/original vs aggregator is insufficient. Folks have pointed out that this representation is 'lossy' - but that is only a problem if the information lost is needed for a compelling use case. Do we have use cases requiring a full/ordered retrieval graph? As I understand, this complexity is not required to support Edge Merging - where we only need to know which Resource was primary/original. Regardless of what level of detail we end up providing, I think there are a couple arguments for defining a dedicated structure to hold Source Provenance info that doesn't use Attributes. Namely, this metadata is of high import and capturing it in a dedicated structure highlights this critical role, and makes it easier to find/operate on. And more generally does there come a time that specific Edge metadata rises to a level of import that adding dedicated fields/objects to the TRAPI spec is warranted? |
Here is the prepared edge merging proposal and example. As you can see, there are many issues that must be considered when addressing edge merging. This proposal discusses some of these issues and offers possible solutions to them. Relevant issues will also need to be discussed by the necessary working groups.
Please let me know what concerns anyone might have regarding this proposal. I will also be happy to answer questions regarding it.
For comment only. Not to merge.