Ontology update infrastructure part 4 #649

benjamingeer · 2017-10-25T16:03:10Z

Treat built-in ontologies other than knora-api and knora-base (e.g. dc) as built-in ontologies, not as project-specific ones.
Serve standoff classes and properties, including the ones in standoff, but not yet the ones in knora-base or standoff datatype tags (see Serve salsah-gui and standoff ontologies #648).
Refactor representation of OWL classes and properties to be more suitable for updates as well as read operations.

…logies, not as project-specific ones.

…in knora-api, and not yet standoff datatype tags).

# Conflicts: # webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala # webapi/src/main/scala/org/knora/webapi/routing/v2/SearchRouteV2.scala # webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala

mrivoal · 2017-10-27T12:43:58Z

docs/rst/knora-api-server/api_v2/knora-iris.rst

+
+Knora ontologies use 'hash namespaces' (see `URI Namespaces`_). This means that the IRI of an ontology entity (a class or property definition) is constructed by adding a hash character (``#``) to the ontology IRI, followed by the name of the entity. In Knora, an entity name must be a valid XML NCName_. Thus, the class ``incunabula:book`` has the following IRIs:
+
+- ``http://www.knora.org/ontology/incunabula#book`` (for the internal entity)


Shouldn't we also use projectUUID in these IRIs? Because if we don't, chances are that we will have to face duplicate in the names of the project with the (hopefully) growing number of DaSCH satellites.

But we should keep the idea of the knora-base:projectShortName property to be used by the GUI as discussed in #595 and add a knora-base:projectCodecorresponding the the projectUUID.

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI. Also, I would be reluctant to put a random number in an ontology IRI. The convention in the RDF world seems to be that ontology IRIs are meaningful rather than random.

If two ontologies hosted on different servers have the same internal IRI, this will nor cause a conflict for clients, because they will have different external IRIs, since the external IRI contains the hostname of the server.

However, we can’t put the hostname (or the name of the institution that created the ontology) in the internal IRI, because projects and ontologies need to be able to move easily from one server (and institution) to another. Therefore the hostname can only be known at runtime.

I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other. Or if two ontologies created with the same name on different servers are later moved to the same server.

The basic problem is that there seem to be only two options for making an ontology IRI unique: (1) require people to register names with a central authority before they use them, or (2) include a very long random number. The first option seems too burdensome, and the second one seems too ugly.

@lrosenth @tobiasschweizer @subotic @loicjaouen what do you think?

By projectUUID, I actually meant projectcode, as until now we haven't mentionned projectUUID (at least I didn't understand it that way).

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI.

Yes, but if you are talking about the kind of standard ontologies shared by various projects, my guess is that these ontologies will undergo a review process. Their status will also change as they proceed from 'project-specific ontologies' to 'standard built-in ontologies'. In the process, aside from cleaning and improving these ontology, can't we imagine that the initial project-ID attached to the IRIs will be changed into a more generic name (such as bibliography, for example)?

I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other.

I don't know if we are ever going to support federated searches on the triplestore level. More likely on the federated webapi server level. In those cases, the different hostnames will give us unique names.

Or if two ontologies created with the same name on different servers are later moved to the same server.

Maybe it would be easier to require a name change in those rare cases?

@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.

The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology. In the case of a project-specific ontology that became standardised, I think ideally there would be a project-specific ‘bridge’ ontology so the data wouldn’t have to change.

So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.

@subotic I think you’re probably right.

@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.

Yes, I figured it out :)

The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology.
I am very aware of that, we have already faced the problem when we changed the syntax of our IRIs to include the projectcode instead of the projetshortname.

So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.

Honestly, I thought this was the road we all agreed on several months ago.

Honestly, I thought this was the road we all agreed on several months ago.

I thought we had only agreed that project IDs would be used in resource IRIs. I don’t remember anything about including them in ontology IRIs. But I’m not known for having the best memory. Anyway, we can certainly do it, but it will take some work to convert everything.

We decided that there would be project ID, and that IRIs would be http://rdfh.ch/PROJECT_ID, we didn't specify clearly "resource" IRI and "project" IRI (c.f. minutes by mail from Lukas dated 2017-03-16).
So we apparently left room for misunderstandings.
As this was to avoid collisions, we thought it should be in the ontology too.

I’d like to see if I can make this work in a backwards-compatible way, so existing data doesn’t have to be changed.

mrivoal · 2017-10-27T12:46:27Z

docs/rst/knora-api-server/api_v2/knora-iris.rst

+The formats of generated data IRIs for different types of objects are as follows:
+
+- Project: ``http://rdfh.ch/projects/PROJECT_UUID``
+- Resource: ``http://rdfh.ch/PROJECT_ID/RESOURCE_UUID`` (where ``PROJECT_ID`` is not a random UUID, but the project's unique ID, which can be registered with the DaSCH)


One the line above, PROJECT_UUID is mentionned whereas here, it is refered to as PROJECT_ID. It should probably be unified.

The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable. But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?

The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable.

Our resources IRIs (in Lausanne) are currently based on the project code (or project-ID, this four number chain of characters that we have discussed earlier), as I think was decided earlier this spring.

But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?

Being in charge, the DaSCH would/could allocate project-IDs range to an institution, like we did already between Basel and Lausanne. Until another system like a registry of projects is set up, as mentionned by @lrosenth this spring as well.

Mass renaming?

OK but this is open source software. Anyone can download and run it even if they aren’t (yet) a DaSCH satellite and don’t gave an allocated ID range.

We could do the same as for IPs and DNS: have a domestic range of project IDs.

the problem is that there is no software code, it is only convention that we set amongst ourselves, we can't allocate a range and put it as a default value in a config file.
Of course, we can explain it in the doc... but human understanding is error prone :)

lrosenth · 2017-10-27T15:31:14Z

I would suggest that each location will get a unique prefix which has to be used to form the project id. With this concept we would not have problems. Federated queries on RDF-level is on my wish list :-)

benjamingeer · 2017-10-27T15:53:05Z

@lrosenth OK but the question is whether we include the project ID in the ontology IRI. This would mean changing all existing ontologies and data.

loicjaouen · 2017-10-27T16:04:13Z

So, if I start writing an ontology for incunabula at UNIL today, I would:

add the project's definition in the admin data graph under the IRI
http://rdfh.ch/projects/abcdefdhijklmnopqrstuv (and we should for now make a UUID generated for us as we write that by hand, no problem)
it's internal ontology would be http://www.knora.org/ontology/unil-incunabula
it's internal ontology's entities would be http://www.knora.org/ontology/unil-incunabula#book
and its resources would be like http://rdfh.ch/0110/abcdefdhijklmnopqrstuv

knowing that 0110 is in the range of ID that is allocated to UNIL

Does that sum up the conversation so far?

… IRIs.

benjamingeer · 2017-10-30T17:44:58Z

I think it will be possible to have optional project IDs in ontology IRIs, so existing data doesn't have to be changed. Thus either of these would work as an internal ontology IRI:

http://www.knora.org/ontology/incunabula
http://www.knora.org/ontology/0010/incunabula

I need to test this more to confirm.

mrivoal · 2017-10-31T09:37:19Z

@benjamingeer, @lrosenth, @loicjaouen, @subotic, I think the discussion is getting really confused and we are not going anywhere. Could we consider a Skype to go through the problem together (again) and to set those IRIs once and for all?

I mean, we have been asking for this and discussing it since march, I think it is high time we come to a conclusion and stick to it.

But @benjamingeer, what you are suggesting right now doesn't fit with what we understood:
Instead of http://www.knora.org/ontology/incunabula#book, with projectID within ontology IRIs, we would expect http://www.knora.org/ontology/0010#book and to-date, this is the way our ontologies are defined.

benjamingeer · 2017-10-31T09:51:00Z

@mrivoal http://www.knora.org/ontology/0010#book isn’t possible, because it wouldn’t allow a project to have more than one ontology.

mrivoal · 2017-10-31T09:55:29Z

You are right... But we did it anyway.

benjamingeer · 2017-10-31T10:08:53Z

Hmm, that will be a problem, because an ontology name is also used as a prefix label, which must be a valid XML NCName, which cannot start with a number.

mrivoal · 2017-10-31T10:17:58Z

But we have defined corresponding prefixes like this:
@prefix h-steiner: <http://www.knora.org/ontology/0110#> .

If we need to change things, we will. The most important is that we can agree on some schema and stick to it. Hence the suggestion to talk about it. Again.

benjamingeer · 2017-10-31T10:31:27Z

In API v2 (and even in API v1 for bulk import), the API server needs to be able to generate a prefix automatically from an ontology IRI. If we use project IDs as namespaces for ontology names, then both the project ID and the ontology name will need to be in the prefix. For example, given this ontology IRI:

http://www.knora.org/ontology/0010/incunabula

The prefix could be p0010-incunabula (that's what I implemented yesterday on this branch, for now).

…d XSD schema for bulk import.

benjamingeer · 2017-10-31T12:26:01Z

@mrivoal @loicjaouen Lukas, Ivan, and I talked about this briefly. Lukas decided that in order to be able to progress more quickly right now, we will go ahead with the optional project ID in the ontology IRI. If your ontologies are working for you now with API v1, you don't need to change them now. When the time comes (but not before January), we will help you make any necessary changes to keep your data working with API v2. So, no meeting is necessary. :)

mrivoal · 2017-10-31T13:13:06Z

Ok, then. Fine.

But as we are working on new ontologies, we still need a detailled list of the correct expected syntaxes for

ontologies IRIs
admin data stuff
permissions data
and for data (lists, instances of resources, values, etc.)

@benjamingeer, could you please review, update and correct @loicjaouen's summary in issue #632 (@loicjaouen did not include the rdfh prefixes for instances) so that we can move forward and so that everything is clear for everyone? At the moment, I am sorry but everything is way too confused for me to proceed.

benjamingeer · 2017-10-31T13:14:48Z

@mrivoal More documentation is on its way.

- Fix extra slash in ontology IRIs.

… into wip/ontology-updates-4

…updates.

loicjaouen · 2017-11-03T07:35:31Z

I have an additional couple of questions, @benjamingeer you previously said:

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI.

The nearly syllogism of the first and last sentences is: "an ontology can exist without a project so no ontology should be linked to a project".

But there is a link between a project and a specific ontology, isn't it? In the admin configuration defining a project, here incunabula, we have:

 knora-base:projectOntologyGraph "http://www.knora.org/ontology/incunabula"^^xsd:string ;

And we load this incunabula in that named graph.

We can still use this ontology in other projects.

What does it change to have an projectOntologyGraph?

Actually, wouldn't be usefull for projects to namely declare ontologies a project use, so the GUI can restrict the choices of resource types for that project and the user is not drowned under similar choices? As a user, if I want to add a book from a bibliographical ontology, I might have troubles to select which of the potentially many bibliographical ontology to choose from, and most likely, I might have trouble to consistently do the same choice.

And later:

http://www.knora.org/ontology/0010#book isn’t possible, because it wouldn’t allow a project to have more than one ontology.

I don't get this one, as we can use any existing ontologies.

subotic · 2017-11-03T08:40:55Z

I'm working on updating the project information (Store an optional short code and multiple ontologies per project #654). The property projectOntologyGraph is now projectOntology, where one project can have 0-n ontologies. Ontologies are stored in a named graph with the same IRI as the ontology.
As far as I understand it, nothing is preventing a project to use any kind of ontology inside the triplestore. We artificially restrict a project, via the GUI, to only use the ones defined in its own project (via projectOntology), because project ontologies are allowed to change. If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen? Either let the data from project A become inconsistent, or not allow project B to change its own ontology?
Ontologies shared by multiple projects will 'live' in a special place. They could, for example, belong to the SystemProject. Since they are stable, users will be able to use them directly or base their project ontologies on them. Thus I would suggest that every ontology is part of a project (user's or system).

loicjaouen · 2017-11-03T08:57:06Z

thanks @subotic for pointing at #654, I read to quickly and missed to fully understand the implications of that issue :( . Got it now!

So, either the ontologies are public or project specific?

benjamingeer · 2017-11-03T09:01:15Z

So, either the ontologies are public or project specific?

That's the goal of #523. I think the API server should enforce this, so this scenario can't happen:

If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen?

subotic · 2017-11-03T09:20:13Z

I think the API server should enforce this, so this scenario can't happen:

If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen?

You are right, the server should enforce this. It would need to check every:

project ontology creation/update
resource creation/change

and make sure that only the project's ontologies or system ontologies are involved.

loicjaouen · 2017-11-03T09:56:25Z

Then there is no problem in having projects IDs or name in project specific ontologies.
It might even be a necessity.

But that's not the right issue to discuss it.

I think we just got caught in misunderstandings due to difference of contexts; us thinking in the "here and now" (#632) and you living already in the next iteration (#654) ;)

Clearing these contexts is a matter of release management, something we should now formalize as we are in production and the next release will require a serious step to port our existing data.

…onstructing class defs in OntologyResponderV2.

…ties in knora-api ontologies. - Add other missing cardinalities.

…ited in JSON-LD.

…ntology.

…rite use.

- Refactor ontology entity classes to remove unnecessary layer in class hierarchy.

…nfoContentV2, so they're available for updates.

…one schema to another, for use in updates too.

fix (webapi): Treat dc and other built-in ontologies as built-in onto…

4467acd

…logies, not as project-specific ones.

benjamingeer mentioned this pull request Oct 25, 2017

error in ontologies route when querying images ontology dhlab-basel/Salsah#95

Closed

Benjamin Geer added 4 commits October 26, 2017 20:20

feature (webapi): Serve standoff classes and properties (but not yet …

a226e9f

…in knora-api, and not yet standoff datatype tags).

Merge branch 'develop' into wip/ontology-updates-4

a205e22

docs (webapi): Continue writing IRI documentation.

4736f2a

mrivoal reviewed Oct 27, 2017

View reviewed changes

feature (webapi): Support optional project IDs in ontology and entity…

e005bdb

… IRIs.

feature (webapi): Use optional project ID in prefix label of generate…

5a0b1bb

…d XSD schema for bulk import.

Benjamin Geer added 7 commits October 31, 2017 15:48

test (webapi): Add tests for ontology and entity IRI conversion.

26c2763

- Fix extra slash in ontology IRIs.

docs (webapi): Continue writing IRI doc.

955f2c5

Merge branch 'develop' into wip/ontology-updates-4

9a67510

test (webapi): Test creating an empty ontology.

079b7d2

Merge branch 'wip/ontology-updates-4' of github.com:dhlab-basel/Knora…

88ab832

… into wip/ontology-updates-4

docs (webapi): More IRI docs.

cccb075

docs (webapi): Correct errors.

b5c3bfc

Benjamin Geer added 3 commits November 1, 2017 15:58

test (webapi): Add more ontology creation tests.

5cbc122

feature (webapi): Add create property request message.

809c1d8

refactor (webapi): Refactor representation of OWL classes to support …

fc4f500

…updates.

Benjamin Geer added 8 commits November 3, 2017 12:27

feature (webapi): Separate direct from inherited cardinalities when c…

4c36839

…onstructing class defs in OntologyResponderV2.

feature (webapi): Distinguish between direct and non-direct cardinali…

edaad2f

…ties in knora-api ontologies. - Add other missing cardinalities.

feature (webapi): Mark inherited cardinalities with knora-api:isInher…

ea2d687

…ited in JSON-LD.

feature (webapi): Add route that takes JSON-LD and creates an empty o…

ad54e44

…ntology.

refactor (webapi): Refactor property definition case class for read-w…

cb7e0ec

…rite use.

fix (webapi): Fix compile error in generated code.

9a81e91

- Refactor ontology entity classes to remove unnecessary layer in class hierarchy.

refactor (webapi): Move predicates from ReadEntityInfoV2 into EntityI…

60ec8bc

…nfoContentV2, so they're available for updates.

refactor (webapi): Start centralising conversion of entity info from …

4303498

…one schema to another, for use in updates too.

benjamingeer requested a review from subotic November 7, 2017 12:15

benjamingeer changed the title ~~More ontology update stuff~~ Ontology update infrastructure part 4 Nov 7, 2017

subotic approved these changes Nov 7, 2017

View reviewed changes

subotic merged commit 312206a into develop Nov 7, 2017

benjamingeer deleted the wip/ontology-updates-4 branch November 7, 2017 14:22

subotic mentioned this pull request Nov 22, 2017

feature (webapi): refactor admin #671

Merged

10 tasks

benjamingeer added the breaking (existing data) label Jan 30, 2019

benjamingeer mentioned this pull request Jan 30, 2019

Use project shortcode in IIIF URLs #1191

Merged

9 tasks

benjamingeer mentioned this pull request Jul 5, 2019

Ben's PR history #571

Open


		Knora ontologies use 'hash namespaces' (see `URI Namespaces`_). This means that the IRI of an ontology entity (a class or property definition) is constructed by adding a hash character (``#``) to the ontology IRI, followed by the name of the entity. In Knora, an entity name must be a valid XML NCName_. Thus, the class ``incunabula:book`` has the following IRIs:

		- ``http://www.knora.org/ontology/incunabula#book`` (for the internal entity)

Ontology update infrastructure part 4 #649

Ontology update infrastructure part 4 #649

Conversation

benjamingeer commented Oct 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lrosenth commented Oct 27, 2017 via email • edited by benjamingeer Loading

benjamingeer commented Oct 27, 2017

loicjaouen commented Oct 27, 2017

benjamingeer commented Oct 30, 2017

mrivoal commented Oct 31, 2017

benjamingeer commented Oct 31, 2017

mrivoal commented Oct 31, 2017

benjamingeer commented Oct 31, 2017

mrivoal commented Oct 31, 2017

benjamingeer commented Oct 31, 2017

benjamingeer commented Oct 31, 2017

mrivoal commented Oct 31, 2017

benjamingeer commented Oct 31, 2017

loicjaouen commented Nov 3, 2017

subotic commented Nov 3, 2017 • edited Loading

loicjaouen commented Nov 3, 2017

benjamingeer commented Nov 3, 2017

subotic commented Nov 3, 2017

loicjaouen commented Nov 3, 2017

benjamingeer commented Oct 25, 2017 •

edited

Loading

lrosenth commented Oct 27, 2017 via email •

edited by benjamingeer

Loading

subotic commented Nov 3, 2017 •

edited

Loading