Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ontology update infrastructure part 4 #649

Merged
merged 25 commits into from
Nov 7, 2017
Merged

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Oct 25, 2017

  • Treat built-in ontologies other than knora-api and knora-base (e.g. dc) as built-in ontologies, not as project-specific ones.
  • Serve standoff classes and properties, including the ones in standoff, but not yet the ones in knora-base or standoff datatype tags (see Serve salsah-gui and standoff ontologies #648).
  • Refactor representation of OWL classes and properties to be more suitable for updates as well as read operations.

Fixes dhlab-basel/Salsah#95.

Benjamin Geer added 4 commits October 26, 2017 20:20
…in knora-api, and not yet standoff datatype tags).
# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/routing/v2/SearchRouteV2.scala
#	webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala

Knora ontologies use 'hash namespaces' (see `URI Namespaces`_). This means that the IRI of an ontology entity (a class or property definition) is constructed by adding a hash character (``#``) to the ontology IRI, followed by the name of the entity. In Knora, an entity name must be a valid XML NCName_. Thus, the class ``incunabula:book`` has the following IRIs:

- ``http://www.knora.org/ontology/incunabula#book`` (for the internal entity)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also use projectUUID in these IRIs? Because if we don't, chances are that we will have to face duplicate in the names of the project with the (hopefully) growing number of DaSCH satellites.

But we should keep the idea of the knora-base:projectShortName property to be used by the GUI as discussed in #595 and add a knora-base:projectCodecorresponding the the projectUUID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI. Also, I would be reluctant to put a random number in an ontology IRI. The convention in the RDF world seems to be that ontology IRIs are meaningful rather than random.

If two ontologies hosted on different servers have the same internal IRI, this will nor cause a conflict for clients, because they will have different external IRIs, since the external IRI contains the hostname of the server.

However, we can’t put the hostname (or the name of the institution that created the ontology) in the internal IRI, because projects and ontologies need to be able to move easily from one server (and institution) to another. Therefore the hostname can only be known at runtime.

I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other. Or if two ontologies created with the same name on different servers are later moved to the same server.

The basic problem is that there seem to be only two options for making an ontology IRI unique: (1) require people to register names with a central authority before they use them, or (2) include a very long random number. The first option seems too burdensome, and the second one seems too ugly.

@lrosenth @tobiasschweizer @subotic @loicjaouen what do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By projectUUID, I actually meant projectcode, as until now we haven't mentionned projectUUID (at least I didn't understand it that way).

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI.

Yes, but if you are talking about the kind of standard ontologies shared by various projects, my guess is that these ontologies will undergo a review process. Their status will also change as they proceed from 'project-specific ontologies' to 'standard built-in ontologies'. In the process, aside from cleaning and improving these ontology, can't we imagine that the initial project-ID attached to the IRIs will be changed into a more generic name (such as bibliography, for example)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other.

I don't know if we are ever going to support federated searches on the triplestore level. More likely on the federated webapi server level. In those cases, the different hostnames will give us unique names.

Or if two ontologies created with the same name on different servers are later moved to the same server.

Maybe it would be easier to require a name change in those rare cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.

The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology. In the case of a project-specific ontology that became standardised, I think ideally there would be a project-specific ‘bridge’ ontology so the data wouldn’t have to change.

So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subotic I think you’re probably right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.

Yes, I figured it out :)

The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology.
I am very aware of that, we have already faced the problem when we changed the syntax of our IRIs to include the projectcode instead of the projetshortname.

So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.

Honestly, I thought this was the road we all agreed on several months ago.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I thought this was the road we all agreed on several months ago.

I thought we had only agreed that project IDs would be used in resource IRIs. I don’t remember anything about including them in ontology IRIs. But I’m not known for having the best memory. Anyway, we can certainly do it, but it will take some work to convert everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that there would be project ID, and that IRIs would be http://rdfh.ch/PROJECT_ID, we didn't specify clearly "resource" IRI and "project" IRI (c.f. minutes by mail from Lukas dated 2017-03-16).
So we apparently left room for misunderstandings.
As this was to avoid collisions, we thought it should be in the ontology too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to see if I can make this work in a backwards-compatible way, so existing data doesn’t have to be changed.

The formats of generated data IRIs for different types of objects are as follows:

- Project: ``http://rdfh.ch/projects/PROJECT_UUID``
- Resource: ``http://rdfh.ch/PROJECT_ID/RESOURCE_UUID`` (where ``PROJECT_ID`` is not a random UUID, but the project's unique ID, which can be registered with the DaSCH)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One the line above, PROJECT_UUID is mentionned whereas here, it is refered to as PROJECT_ID. It should probably be unified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable. But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable.

Our resources IRIs (in Lausanne) are currently based on the project code (or project-ID, this four number chain of characters that we have discussed earlier), as I think was decided earlier this spring.

But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?

Being in charge, the DaSCH would/could allocate project-IDs range to an institution, like we did already between Basel and Lausanne. Until another system like a registry of projects is set up, as mentionned by @lrosenth this spring as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mass renaming?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK but this is open source software. Anyone can download and run it even if they aren’t (yet) a DaSCH satellite and don’t gave an allocated ID range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do the same as for IPs and DNS: have a domestic range of project IDs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that there is no software code, it is only convention that we set amongst ourselves, we can't allocate a range and put it as a default value in a config file.
Of course, we can explain it in the doc... but human understanding is error prone :)

@lrosenth
Copy link
Contributor

lrosenth commented Oct 27, 2017 via email

@benjamingeer
Copy link
Author

@lrosenth OK but the question is whether we include the project ID in the ontology IRI. This would mean changing all existing ontologies and data.

@loicjaouen
Copy link
Contributor

So, if I start writing an ontology for incunabula at UNIL today, I would:

  • add the project's definition in the admin data graph under the IRI
    http://rdfh.ch/projects/abcdefdhijklmnopqrstuv (and we should for now make a UUID generated for us as we write that by hand, no problem)
  • it's internal ontology would be http://www.knora.org/ontology/unil-incunabula
  • it's internal ontology's entities would be http://www.knora.org/ontology/unil-incunabula#book
  • and its resources would be like http://rdfh.ch/0110/abcdefdhijklmnopqrstuv

knowing that 0110 is in the range of ID that is allocated to UNIL

Does that sum up the conversation so far?

@benjamingeer
Copy link
Author

I think it will be possible to have optional project IDs in ontology IRIs, so existing data doesn't have to be changed. Thus either of these would work as an internal ontology IRI:

  • http://www.knora.org/ontology/incunabula
  • http://www.knora.org/ontology/0010/incunabula

I need to test this more to confirm.

@mrivoal
Copy link

mrivoal commented Oct 31, 2017

@benjamingeer, @lrosenth, @loicjaouen, @subotic, I think the discussion is getting really confused and we are not going anywhere. Could we consider a Skype to go through the problem together (again) and to set those IRIs once and for all?

I mean, we have been asking for this and discussing it since march, I think it is high time we come to a conclusion and stick to it.

But @benjamingeer, what you are suggesting right now doesn't fit with what we understood:
Instead of http://www.knora.org/ontology/incunabula#book, with projectID within ontology IRIs, we would expect http://www.knora.org/ontology/0010#book and to-date, this is the way our ontologies are defined.

@benjamingeer
Copy link
Author

@mrivoal http://www.knora.org/ontology/0010#book isn’t possible, because it wouldn’t allow a project to have more than one ontology.

@mrivoal
Copy link

mrivoal commented Oct 31, 2017

You are right... But we did it anyway.

@benjamingeer
Copy link
Author

Hmm, that will be a problem, because an ontology name is also used as a prefix label, which must be a valid XML NCName, which cannot start with a number.

@mrivoal
Copy link

mrivoal commented Oct 31, 2017

But we have defined corresponding prefixes like this:
@prefix h-steiner: <http://www.knora.org/ontology/0110#> .

If we need to change things, we will. The most important is that we can agree on some schema and stick to it. Hence the suggestion to talk about it. Again.

@benjamingeer
Copy link
Author

In API v2 (and even in API v1 for bulk import), the API server needs to be able to generate a prefix automatically from an ontology IRI. If we use project IDs as namespaces for ontology names, then both the project ID and the ontology name will need to be in the prefix. For example, given this ontology IRI:

http://www.knora.org/ontology/0010/incunabula

The prefix could be p0010-incunabula (that's what I implemented yesterday on this branch, for now).

@benjamingeer
Copy link
Author

@mrivoal @loicjaouen Lukas, Ivan, and I talked about this briefly. Lukas decided that in order to be able to progress more quickly right now, we will go ahead with the optional project ID in the ontology IRI. If your ontologies are working for you now with API v1, you don't need to change them now. When the time comes (but not before January), we will help you make any necessary changes to keep your data working with API v2. So, no meeting is necessary. :)

@mrivoal
Copy link

mrivoal commented Oct 31, 2017

Ok, then. Fine.

But as we are working on new ontologies, we still need a detailled list of the correct expected syntaxes for

  • ontologies IRIs
  • admin data stuff
  • permissions data
  • and for data (lists, instances of resources, values, etc.)

@benjamingeer, could you please review, update and correct @loicjaouen's summary in issue #632 (@loicjaouen did not include the rdfh prefixes for instances) so that we can move forward and so that everything is clear for everyone? At the moment, I am sorry but everything is way too confused for me to proceed.

@benjamingeer
Copy link
Author

@mrivoal More documentation is on its way.

@loicjaouen
Copy link
Contributor

I have an additional couple of questions, @benjamingeer you previously said:

An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI.

The nearly syllogism of the first and last sentences is: "an ontology can exist without a project so no ontology should be linked to a project".

But there is a link between a project and a specific ontology, isn't it? In the admin configuration defining a project, here incunabula, we have:

 knora-base:projectOntologyGraph "http://www.knora.org/ontology/incunabula"^^xsd:string ;

And we load this incunabula in that named graph.

We can still use this ontology in other projects.

What does it change to have an projectOntologyGraph?

Actually, wouldn't be usefull for projects to namely declare ontologies a project use, so the GUI can restrict the choices of resource types for that project and the user is not drowned under similar choices? As a user, if I want to add a book from a bibliographical ontology, I might have troubles to select which of the potentially many bibliographical ontology to choose from, and most likely, I might have trouble to consistently do the same choice.

And later:

http://www.knora.org/ontology/0010#book isn’t possible, because it wouldn’t allow a project to have more than one ontology.

I don't get this one, as we can use any existing ontologies.

@subotic
Copy link
Collaborator

subotic commented Nov 3, 2017

  • I'm working on updating the project information (Store an optional short code and multiple ontologies per project #654). The property projectOntologyGraph is now projectOntology, where one project can have 0-n ontologies. Ontologies are stored in a named graph with the same IRI as the ontology.
  • As far as I understand it, nothing is preventing a project to use any kind of ontology inside the triplestore. We artificially restrict a project, via the GUI, to only use the ones defined in its own project (via projectOntology), because project ontologies are allowed to change. If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen? Either let the data from project A become inconsistent, or not allow project B to change its own ontology?
  • Ontologies shared by multiple projects will 'live' in a special place. They could, for example, belong to the SystemProject. Since they are stable, users will be able to use them directly or base their project ontologies on them. Thus I would suggest that every ontology is part of a project (user's or system).

@loicjaouen
Copy link
Contributor

thanks @subotic for pointing at #654, I read to quickly and missed to fully understand the implications of that issue :( . Got it now!

So, either the ontologies are public or project specific?

@benjamingeer
Copy link
Author

So, either the ontologies are public or project specific?

That's the goal of #523. I think the API server should enforce this, so this scenario can't happen:

If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen?

@subotic
Copy link
Collaborator

subotic commented Nov 3, 2017

I think the API server should enforce this, so this scenario can't happen:

If project A uses an ontology from project B, and then project B decides to change the ontology, what should then happen?

You are right, the server should enforce this. It would need to check every:

  • project ontology creation/update
  • resource creation/change

and make sure that only the project's ontologies or system ontologies are involved.

@loicjaouen
Copy link
Contributor

Then there is no problem in having projects IDs or name in project specific ontologies.
It might even be a necessity.

But that's not the right issue to discuss it.

I think we just got caught in misunderstandings due to difference of contexts; us thinking in the "here and now" (#632) and you living already in the next iteration (#654) ;)

Clearing these contexts is a matter of release management, something we should now formalize as we are in production and the next release will require a serious step to port our existing data.

Benjamin Geer added 8 commits November 3, 2017 12:27
…onstructing class defs in OntologyResponderV2.
…ties in knora-api ontologies.

- Add other missing cardinalities.
- Refactor ontology entity classes to remove unnecessary layer in class hierarchy.
…nfoContentV2, so they're available for updates.
…one schema to another, for use in updates too.
@benjamingeer benjamingeer requested a review from subotic November 7, 2017 12:15
@benjamingeer benjamingeer changed the title More ontology update stuff Ontology update infrastructure part 4 Nov 7, 2017
@subotic subotic merged commit 312206a into develop Nov 7, 2017
@benjamingeer benjamingeer deleted the wip/ontology-updates-4 branch November 7, 2017 14:22
@subotic subotic mentioned this pull request Nov 22, 2017
10 tasks
@benjamingeer benjamingeer mentioned this pull request Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants