Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add curie_map to the model #376

Merged
merged 10 commits into from
Aug 6, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Next

- Add the concept of "propagatable slots".
- Add the `curie_map` to the model (instead of it being a specificity of the SSSOM/TSV format).

## SSSOM version 0.15.1

Expand Down
13 changes: 13 additions & 0 deletions examples/schema/curie_map.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#curie_map:
# HP: http://purl.obolibrary.org/obo/HP_
# MP: http://purl.obolibrary.org/obo/MP_
# orcid: https://orcid.org/
#mapping_set_id: https://w3id.org/sssom/commons/examples/curie_map.sssom.tsv
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#creator_id: orcid:0000-0002-7356-1779
matentzn marked this conversation as resolved.
Show resolved Hide resolved
#mapping_provider: "https://w3id.org/sssom/core_team"
#comment: This is an example file for the SSSOM for illustration only. Its contents are entirely fabricated.
subject_id predicate_id object_id mapping_justification
HP:0009124 skos:exactMatch MP:0000003 semapv:ManualMappingCuration
HP:0008551 skos:exactMatch MP:0000018 semapv:ManualMappingCuration
HP:0000411 skos:exactMatch MP:0000021 semapv:ManualMappingCuration
7 changes: 2 additions & 5 deletions src/docs/spec-formats-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,9 @@ SSSOM/TSV files MUST be encoded in UTF-8 ([RFC 3629](https://datatracker.ietf.or

All identifiers in a SSSOM/TSV file, that is, all the values of slots typed as [EntityReference](EntityReference.md), MUST be serialised in [CURIE syntax](https://www.w3.org/TR/curie/). SSSOM/TSV parsers SHOULD reject files containing identifiers serialised as IRIs.

To allow unambiguous resolution of all CURIEs present in a SSSOM/TSV file, the metadata block MUST contain an additional `curie_map` field, which is a map of prefix names to IRI prefixes. The `curie_map` field SHOULD appear at the beginning of the metadata block.
As stated in the description of the model ([Identifiers section](spec-model.md#identifiers)), all prefix names used in CURIEs MUST be declared in the `curie_map` slot of the mapping set object, unless the prefix is a “built-in” prefix (in which case it MAY be omitted). SSSOM/TSV parsers MUST reject a file with undeclared, non-built-in prefix names.

Any prefix name used in a SSSOM/TSV file MUST be declared with a corresponding entry in the CURIE map. SSSOM/TSV parsers MUST reject a file with undeclared prefix names.

Prefix names listed in the table found in the [IRI prefixes](spec-intro.md#iri-prefixes) section are considered “built-in”. As such, they MAY be omitted from the CURIE map. If they are not omitted, they MUST point to the same IRI prefixes as in the aforementioned table.
A SSSOM/TSV writer SHOULD refuse to serialise a mapping set that contains IRIs that cannot be contracted into CURIEs because there is no suitable prefix declaration in its CURIE map. The use of a custom, ad-hoc logic to infer a possible prefix name where none has been provided (e.g., “if the IRI ends with a `ZZZ_NNNNNNN` pattern, turn it into a `ZZZ:NNNNNNN` CURIE”) is strongly discouraged.


## Propagatable slots
Expand Down Expand Up @@ -203,7 +201,6 @@ When writing the metadata block, a canonical SSSOM/TSV writer:
* MUST serialise multi-valued slots as YAML “block sequences” ([YAML Specification §8.2.1](https://yaml.org/spec/1.2.2/#821-block-sequences)) – even when the list of values contains only one item;
* MUST serialise scalar values in YAML “plain style” ([YAML Specification §7.3.3](https://yaml.org/spec/1.2.2/#733-plain-style)) whenever possible, otherwise in “double-quoted style” ([YAML Specification §7.3.1](https://yaml.org/spec/1.2.2/#731-double-quoted-style));
* MUST serialise the slots in the order they appear in the [“Slots” table](MappingSet.md#slots), in the documentation for the `MappingSet` class;
* MUST write the `curie_map` at the beginning of the block, before any other slots;
matentzn marked this conversation as resolved.
Show resolved Hide resolved
* MUST NOT include in the CURIE map the prefix names that are considered “built-in”;
* MUST NOT include in the CURIE map any prefix name that is not used anywhere in the set;
* MUST sort the prefix names in the CURIE map in lexicographical order.
Expand Down
9 changes: 9 additions & 0 deletions src/docs/spec-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ The `MappingSet` class represents, well, a set of individual mappings, which are
Of note, within a set, a mapping may not necessarily be uniquely identified by the combination of its four mandatory slots (`subject_id`, `predicate_id`, `object_id`, and `mapping_justification`). A set may very well contain several mappings with the same subject, predicate, object, and justification, but that differ on some of the other, complementary slots.


## Identifiers

Throughout the model, identifiers to external resources are represented using the custom type [`EntityReference`](EntityReference.md) (based on the LinkML type [`uriorcurie`](https://w3id.org/linkml/Uriorcurie)), which accepts both full-length IRIs and [CURIEs](https://www.w3.org/TR/curie/) as possible identifier formats. (Note however that serialisation formats may mandate the use of one identifier format over the other; for example, the [SSSOM/TSV](spec-formats-tsv.md) format requires the systematic use of CURIEs, whereas the [OWL/RDF](spec-formats-owl.md) format conversely requires the systematic use of IRIs).

Whenever the CURIE syntax is used in a mapping set (whether this is by choice of the SSSOM producer, or because it is mandated by the serialisation format), all CURIEs MUST be unambiguously resolvable into corresponding full-length IRIs without requiring any external resources. This means that any prefix name used MUST be properly declared in the set’s `curie_map` slot, which is a dictionary associating a prefix name to an IRI prefix.

By exception, prefix names listed in the table found in the [IRI prefixes](spec-intro.md#iri-prefixes) section are considered “built-in”. As such, they MAY be omitted from the `curie_map`. If they are not omitted, they MUST point to the same IRI prefixes as in the aforementioned table.


## Propagation of mapping set slots

As mentioned briefly above, there are two different types of slots in the `MappingSet` class:
Expand Down
19 changes: 19 additions & 0 deletions src/sssom_schema/schema/sssom_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,20 @@ types:
- https://mapping-commons.github.io/sssom/spec/#tsv

slots:
prefix_name:
key: true
range: ncname
prefix_url:
range: uri
curie_map:
description: A dictionary that contains prefixes as keys and their URI expansions as values.
range: prefix
multivalued: true
inlined: true
see_also:
- https://github.com/mapping-commons/sssom/issues/225
- https://github.com/mapping-commons/sssom/pull/349
- https://github.com/mapping-commons/sssom/blob/master/examples/schema/curie_map.sssom.tsv
mirror_from:
description: A URL location from which to obtain a resource, such as a mapping set.
range: uri
Expand Down Expand Up @@ -627,6 +641,7 @@ classes:
license:
required: true
slots:
- curie_map
- mappings
- mapping_set_id
- mapping_set_version
Expand Down Expand Up @@ -770,6 +785,10 @@ classes:
- mapping_set_group
- last_updated
- local_name
prefix:
slots:
- prefix_name
- prefix_url
Propagatable:
class_uri: sssom:Propagatable
description: Metamodel extension class to describe slots whose value can be
Expand Down
Loading