Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename scope to predicate #22

Merged
merged 2 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,14 @@ columns:
2. `curie` the compact uniform resource identifier (CURIE) for a biomedical
entity or concept, standardized using the Bioregistry
3. `name` the standard name for the concept
4. `scope` the predicate which encodes the synonym scope, written as a CURIE
from the [OBO in OWL (`oio`)](https://bioregistry.io/oio) controlled
vocabulary, i.e., one of:
4. `predicate` the predicate which encodes the synonym scope, written as a CURIE
from the [OBO in OWL (`oboInOWL`)](https://bioregistry.io/oio) or RDFS
controlled vocabularies, e.g., one of:
- `rdfs:label`
- `oboInOwl:hasExactSynonym`
- `oboInOwl:hasNarrowSynonym` (i.e., the synonym represents a narrower term)
- `oboInOwl:hasBroadSynonym` (i.e., the synonym represents a broader term)
- `oboInOwl:hasRelatedSynonym`
- `oboInOwl:hasSynonym` (use this if the scope is unknown)
- `oboInOwl:hasRelatedSynonym` (use this if the scope is unknown)
5. `type` the (optional) synonym property type, written as a CURIE from the
[OBO Metadata Ontology (`omo`)](https://bioregistry.io/omo) controlled
vocabulary, e.g., one of:
Expand All @@ -93,17 +93,17 @@ columns:

Here's an example of some rows in the synonyms table (with linkified CURIEs):

| text | curie | scope | provenance | contributor | language |
| ------------------------------- | ------------------------------------------------- | --------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | -------- |
| PI(3,4,5)P3 | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oboInOwl:hasExactSynonym](https://bioregistry.io/oboInOwl:hasExactSynonym) | [pubmed:29623928](https://bioregistry.io/pubmed:29623928), [pubmed:20817957](https://bioregistry.io/pubmed:20817957) | [orcid:0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | en |
| phosphatidylinositol (3,4,5) P3 | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oboInOwl:hasExactSynonym](https://bioregistry.io/oboInOwl:hasExactSynonym) | [pubmed:29695532](https://bioregistry.io/pubmed:29695532) | [orcid:0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | en |
| text | curie | predicate | provenance | contributor | language |
| --------------- | --------------------------------------------------- | --------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | -------- |
| alsterpaullone | [CHEBI:138488](https://bioregistry.io/CHEBI:138488) | [rdfs:label](https://bioregistry.io/rdfs:label) | [pubmed:30655881](https://bioregistry.io/pubmed:30655881) | [orcid:0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | en |
| 9-nitropaullone | [CHEBI:138488](https://bioregistry.io/CHEBI:138488) | [oboInOwl:hasExactSynonym](https://bioregistry.io/oboInOwl:hasExactSynonym) | [pubmed:11597333](https://bioregistry.io/pubmed:11597333), [pubmed:10911915](https://bioregistry.io/pubmed:10911915) | [orcid:0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | en |

### Incorrect Synonyms

The [`negatives.tsv`](src/biosynonyms/resources/negatives.tsv) has the following
columns for non-trivial examples of text strings that aren't synonyms. This
document doesn't address the same issues as context-based disambiguation, but
rather helps dscribe issues like incorrect sub-string matching:
rather helps describe issues like incorrect sub-string matching:

1. `text` the non-synonym text itself
2. `curie` the compact uniform resource identifier (CURIE) for a biomedical
Expand Down
4 changes: 2 additions & 2 deletions src/biosynonyms/generate_owl.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def get_axiom_str(reference: Reference, synonym: Synonym) -> str | None:
[
a owl:Axiom ;
owl:annotatedSource {reference.curie} ;
owl:annotatedProperty {synonym.scope.curie} ;
owl:annotatedProperty {synonym.predicate.curie} ;
owl:annotatedTarget {_text_for_turtle(synonym)} ;
{axiom_parts_str}
] .
Expand Down Expand Up @@ -255,7 +255,7 @@ def _write_owl_rdf( # noqa:C901
mains: list[str] = []
axiom_strs: list[str] = []
for synonym in synonyms:
mains.append(f"{synonym.scope.curie} {_text_for_turtle(synonym)}")
mains.append(f"{synonym.predicate.curie} {_text_for_turtle(synonym)}")
if axiom_str := get_axiom_str(reference, synonym):
axiom_strs.append(axiom_str)

Expand Down
29 changes: 18 additions & 11 deletions src/biosynonyms/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class SynonymTuple(NamedTuple):
text: str
curie: str
name: str
scope: str
predicate: str
type: str | None
provenance: str | None
contributor: str | None
Expand All @@ -50,6 +50,9 @@ class SynonymTuple(NamedTuple):
#: The header for the spreadsheet
HEADER = list(SynonymTuple._fields)

#: A set of permissible predicates
PREDICATES = [v.has_label, *v.synonym_scopes.values()]


class Synonym(BaseModel):
"""A data model for synonyms."""
Expand All @@ -61,23 +64,27 @@ class Synonym(BaseModel):
"assumed to be american english.",
)
reference: NamedReference
scope: Reference = Field(
default=Reference.from_curie("oboInOwl:hasSynonym"),
predicate: Reference = Field(
default=v.has_related_synonym,
description="The predicate that connects the term (as subject) "
"to the textual synonym (as object)",
examples=PREDICATES,
)
type: Reference | None = Field(
default=None,
title="Synonym type",
description="See the OBO Metadata Ontology for valid values",
examples=list(v.synonym_types),
)

provenance: list[Reference] = Field(
default_factory=list,
description="A list of articles (e.g., from PubMed, PMC, arXiv) where this synonym appears",
)
contributor: Reference | None = Field(
None, description="The contributor, usually given as a reference to ORCID"
None,
description="The contributor, usually given as a reference to ORCID",
examples=[v.charlie],
)
comment: str | None = Field(
None, description="An optional comment on the synonym curation or status"
Expand All @@ -89,7 +96,7 @@ class Synonym(BaseModel):

def get_all_references(self) -> set[Reference]:
"""Get all references made by this object."""
rv: set[Reference] = {self.reference, self.scope, *self.provenance}
rv: set[Reference] = {self.reference, self.predicate, *self.provenance}
if self.type:
rv.add(self.type)
if self.contributor:
Expand Down Expand Up @@ -125,10 +132,10 @@ def from_row(
"reference": NamedReference(
prefix=reference.prefix, identifier=reference.identifier, name=name
),
"scope": (
Reference.from_curie(scope_curie.strip())
if (scope_curie := row.get("scope"))
else Reference.from_curie("oboInOwl:hasSynonym")
"predicate": (
Reference.from_curie(predicate_curie.strip())
if (predicate_curie := row.get("predicate"))
else v.has_related_synonym
),
"type": _safe_parse_curie(row["type"]) if "type" in row else None,
"provenance": [
Expand All @@ -154,7 +161,7 @@ def _as_row(self) -> SynonymTuple:
text=self.text,
curie=self.curie,
name=self.name,
scope=self.scope.curie,
predicate=self.predicate.curie,
type=self.type.curie if self.type else None,
provenance=",".join(p.curie for p in self.provenance) if self.provenance else None,
contributor=self.contributor.curie if self.contributor is not None else None,
Expand Down Expand Up @@ -204,7 +211,7 @@ def to_gilda(self, organism: str | None = None) -> gilda.Term:

def _get_gilda_status(synonym: Synonym) -> GildaStatus:
"""Get the Gilda status for a synonym."""
if synonym.scope and synonym.scope.pair == v.has_label.pair:
if synonym.predicate and synonym.predicate.pair == v.has_label.pair:
return "name"
if synonym.type and synonym.type.pair == v.previous_name.pair:
return "former_name"
Expand Down
10 changes: 2 additions & 8 deletions src/biosynonyms/resources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pathlib import Path
from typing import TYPE_CHECKING, cast

from biosynonyms.model import Synonym, grounder_from_synonyms, parse_synonyms
from biosynonyms.model import PREDICATES, Synonym, grounder_from_synonyms, parse_synonyms

if TYPE_CHECKING:
import gilda
Expand All @@ -25,13 +25,7 @@
NEGATIVES_PATH = HERE.joinpath("negatives.tsv")
UNENTITIES_PATH = HERE.joinpath("unentities.tsv")

SYNONYM_SCOPES = {
"oboInOwl:hasExactSynonym",
"oboInOwl:hasNarrowSynonym",
"oboInOwl:hasBroadSynonym",
"oboInOwl:hasRelatedSynonym",
"oboInOwl:hasSynonym",
}
SYNONYM_PREDICATE_CURIES: set[str] = {p.curie for p in PREDICATES}


def load_unentities() -> set[str]:
Expand Down
2 changes: 1 addition & 1 deletion src/biosynonyms/resources/positives.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
text curie name scope type provenance contributor date language comment source
text curie name predicate type provenance contributor date language comment source
1,3-dimethylurate chebi:133726 1,3-dimethylurate anion oboInOwl:hasExactSynonym orcid:0000-0003-4423-4370 en biosynonyms
abema mesh:C000590451 abemaciclib oboInOwl:hasExactSynonym orcid:0000-0001-9439-5346 en biosynonyms
adaptor protein cbl interpro:IPR024162 Adaptor protein Cbl oboInOwl:hasExactSynonym orcid:0000-0003-4423-4370 en biosynonyms
Expand Down
15 changes: 9 additions & 6 deletions tests/test_integrity.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from biosynonyms.resources import (
NEGATIVES_PATH,
POSITIVES_PATH,
SYNONYM_SCOPES,
SYNONYM_PREDICATE_CURIES,
UNENTITIES_PATH,
_unentities_key,
)
Expand Down Expand Up @@ -51,7 +51,7 @@ def test_positives(self):
text,
curie,
_name,
scope,
predicate,
synonym_type,
references,
contributor_curie,
Expand All @@ -62,7 +62,7 @@ def test_positives(self):
) = row
self.assertLess(1, len(text), msg="can not have 1 letter synonyms")
self.assert_curie(curie)
self.assertIn(scope, SYNONYM_SCOPES)
self.assertIn(predicate, SYNONYM_PREDICATE_CURIES)
if synonym_type:
self.assertTrue(synonym_type.startswith("OMO:"))
for reference in references.split(",") if references else []:
Expand Down Expand Up @@ -131,17 +131,20 @@ def test_gilda(self):
reference = NamedReference.from_curie("test:1", "test")
label = Reference.from_curie("rdfs:label")
synonym_1 = Synonym(
text="tests", scope=v.has_exact_synonym, type=v.plural_form, reference=reference
text="tests", predicate=v.has_exact_synonym, type=v.plural_form, reference=reference
)
gilda_term_1 = synonym_1.to_gilda()
self.assertEqual("synonym", gilda_term_1.status)

synonym_2 = Synonym(text="test", scope=label, reference=reference)
synonym_2 = Synonym(text="test", predicate=label, reference=reference)
gilda_term_2 = synonym_2.to_gilda()
self.assertEqual("name", gilda_term_2.status)

synonym_3 = Synonym(
text="old test", scope=v.has_exact_synonym, reference=reference, type=v.previous_name
text="old test",
predicate=v.has_exact_synonym,
reference=reference,
type=v.previous_name,
)
gilda_term_3 = synonym_3.to_gilda()
self.assertEqual("former_name", gilda_term_3.status)
Expand Down
Loading