You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each text-mined Biolink association, we would like to provide relevant EPC data including:
The sentence from which the assertion was mined
An identifier for the document that contains the sentence
The character offsets (relative to the sentence) for the text mentions of the subject and object of the assertion
A confidence score for this specific text-mined assertion (right now this is the score reported by the classifier that identified the relation)
This goal of this issue is to discuss how to represent the EPC data using the Attribute object that is defined in the TRAPI specification.
An initial proposal for Attribute representation is available in this document.
The proposal in this issue builds off of the original, and specifically addresses a need to group EPC into individual packets that contain the sentence and other relevant information so that multiple EPC packets can be associated with a single assertion.
Data for a text-mined assertion
graph": {"nodes": [{"id": "n0","type": "biolink:ChemicalSubstance","curie": "CHEBI:3215"# bupivacaine},{"id": "n1","type": "biolink:GeneOrGeneProduct","curie": "PR:000031567"# LRRC3B }],"edges": [{"id": "e0","source_id": "n0","target_id": "n1","type": "biolink:negatively_regulates_entity_to_entity"}]}# This assertion is supported by two sentences in the literature{'publication': 'PMID:29085514', 'score': '0.99956816', 'sentence': 'The administration of 50 µg/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells.', 'subject_spans': 'start: 31, end: 42', 'object_spans': 'start: 104, end: 110', 'provided_by': 'TMProvider'}{'publication': 'PMID:12345678', 'score': '0.876', 'sentence': 'This is a second sentence indicating that bupivacaine negatively regulates LRRC3B.', 'subject_spans': 'start: 42, end: 53', 'object_spans': 'start: 75, end: 81', 'provided_by': 'TMProvider'}
Proposed Attribute representation
The proposed Attribute representation models this assertion as a single edge between bupivacaine and LRRC3B with two accompanying Attributes representing the EPC data. Nested Attributes are used to allow each packet of sentence information to be self-contained. Also demonstrated are attributes representing a confidence score for the concept recognition of each node (concept), and an aggregate confidence score computed for each edge.
nodes:
- id: CHEBI:3215category: biolink:ChemicalSubstancename: "bupivacaine"attributes:
- attribute_type_id: SEPIO:0000168 # confidence_scoreattribute_from_source: "has confidence score"value: 0.7578value_type_id: biolink:ConfidenceLevelvalue_type_from_source: "confidence score"value_source: TMProvider
- id: PR:000031567category: biolink:GeneOrGeneProductname: "LRRC3B"attributes:
- attribute_type_id: SEPIO:0000168 # confidence_scoreattribute_from_source: "has confidence score"value: 0.5467value_type_id: biolink:ConfidenceLevelvalue_type_from_source: "confidence score"value_source: TMProvideredges:
- id: tmkp.Association001category: biolink:ChemicalToGeneAssociationsubject: CHEBI:3215 # bupivacainepredicate: biolink:negatively_regulates_entity_to_entityobject: PR:000031567 # LRRC3B attributes:
- attribute_type_id: SEPIO:0000438 # has_supporting_evidence_from_sourceattribute_from_source: "source publication"# what the source might have called the relationshipvalue: PMID:29085514value_type_id: biolink:Publication # here a biolink term is used to type the value.value_type_from_source: "PMID"value_source: TMProviderattributes:
- attribute_type_id: SIO:000028 # has partvalue: "The administration of 50 µg/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells."value_type_id: EDAM:data_3671 # text, or SIO:000113 'sentence' value_type_from_source: sentence text attributes:
- attribute_type_id: SIO:000028 # has partvalue: '31|42'value_type_id: SIO:001056 # character positionvalue_type_from_source: subject span
- attribute_type_id: SIO:000028 # has partvalue: '104|110'value_type_id: SIO:001056 # character positionvalue_type_from_source: object span
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence value: 0.99956816value_type_id: EDAM:data_1772 # score value_type_from_source: sentence confidence score value_source: TMProvider BERT model v0.1
- attribute_type_id: SEPIO:0000438 # has_supporting_evidence_from_sourceattribute_from_source: "source publication"# what the source might have called the relationshipvalue: PMID:12345678value_type_id: biolink:Publication # here a biolink term is used to type the value.value_type_from_source: "PMID"value_source: TMProviderattributes:
- attribute_type_id: SIO:000028 # has partvalue: "This is a second sentence indicating that bupivacaine negatively regulates LRRC3B.'"value_type_id: EDAM:data_3671 # text, or SIO:000113 'sentence' value_type_from_source: sentence text attributes:
- attribute_type_id: SIO:000028 # has partvalue: '42|53'value_type_id: SIO:001056 # character positionvalue_type_from_source: subject span
- attribute_type_id: SIO:000028 # has partvalue: '75|81'value_type_id: SIO:001056 # character positionvalue_type_from_source: object span
- attribute_type_id: SEPIO:0000440 # has_supporting_evidence value: 0.876value_type_id: EDAM:data_1772 # score value_type_from_source: sentence confidence score value_source: TMProvider BERT model v0.1
- attribute_type_id: SEPIO:0000168 # confidence_scoreattribute_from_source: "has aggregate confidence score"value: 0.64711234value_type_id: biolink:ConfidenceLevelvalue_type_from_source: "aggregate confidence score"value_source: TMProvider
The text was updated successfully, but these errors were encountered:
bill-baumgartner
added
incoming
status - This issue has been submitted and is awaiting approval/triage
in review
status - This issue has been addressed and is now undergoing review
and removed
incoming
status - This issue has been submitted and is awaiting approval/triage
labels
Jan 13, 2021
For comparison purposes, shown below is an alternative approach that uses no nesting of Attributes, and instead makes use of arrays to specify attribute values. For a given EPC packet, the sentence, score, subject & object spans, and PMID are inherently connected based on the array index used to store their values.
Note: This is the current output format used by the Service Provider to serve up the Text Mining Provider text-mined Biolink association KG.
edges:
- id: 9445e98f72ada21aa572559e303e4d5ac414650fpredicate: biolink:negatively_regulates,subject: CHEBI:3215 # bupivacaineobject: PR:000031567 # LRRC3Battributes:
- type: biolink:provided_byname: provided_byvalue: Text Mining KP
- type: bts:apiname: apivalue: Text Mining Targeted Association API
- type: bts:scorename: scorevalue:
- 0.99956816
- 0.876
- type: bts:sentencename: sentencevalue:
- "The administration of 50 µg/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells."
- "This is a second sentence indicating that bupivacaine negatively regulates LRRC3B."
- type: bts:subject_spansname: subject_spansvalue:
- "31|42"
- "42|53"
- type: bts:object_spansname: object_spansvalue:
- "104|110"
- "75|81"
- type: bts:publicationsname: publicationsvalue:
- PMID:29085514
- PMID:12345678
For each text-mined Biolink association, we would like to provide relevant EPC data including:
subject
andobject
of the assertionThis goal of this issue is to discuss how to represent the EPC data using the
Attribute
object that is defined in the TRAPI specification.An initial proposal for Attribute representation is available in this document.
The proposal in this issue builds off of the original, and specifically addresses a need to group EPC into individual packets that contain the sentence and other relevant information so that multiple EPC packets can be associated with a single assertion.
Data for a text-mined assertion
Proposed Attribute representation
The proposed
Attribute
representation models this assertion as a single edge betweenbupivacaine
andLRRC3B
with two accompanyingAttributes
representing the EPC data. NestedAttributes
are used to allow each packet of sentence information to be self-contained. Also demonstrated are attributes representing a confidence score for the concept recognition of each node (concept), and an aggregate confidence score computed for each edge.The text was updated successfully, but these errors were encountered: