-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ontology update infrastructure part 4 #649
Conversation
…logies, not as project-specific ones.
…in knora-api, and not yet standoff datatype tags).
# Conflicts: # webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala # webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala # webapi/src/main/scala/org/knora/webapi/routing/v2/SearchRouteV2.scala # webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala
|
||
Knora ontologies use 'hash namespaces' (see `URI Namespaces`_). This means that the IRI of an ontology entity (a class or property definition) is constructed by adding a hash character (``#``) to the ontology IRI, followed by the name of the entity. In Knora, an entity name must be a valid XML NCName_. Thus, the class ``incunabula:book`` has the following IRIs: | ||
|
||
- ``http://www.knora.org/ontology/incunabula#book`` (for the internal entity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also use projectUUID in these IRIs? Because if we don't, chances are that we will have to face duplicate in the names of the project with the (hopefully) growing number of DaSCH satellites.
But we should keep the idea of the knora-base:projectShortName
property to be used by the GUI as discussed in #595 and add a knora-base:projectCode
corresponding the the projectUUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI. Also, I would be reluctant to put a random number in an ontology IRI. The convention in the RDF world seems to be that ontology IRIs are meaningful rather than random.
If two ontologies hosted on different servers have the same internal IRI, this will nor cause a conflict for clients, because they will have different external IRIs, since the external IRI contains the hostname of the server.
However, we can’t put the hostname (or the name of the institution that created the ontology) in the internal IRI, because projects and ontologies need to be able to move easily from one server (and institution) to another. Therefore the hostname can only be known at runtime.
I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other. Or if two ontologies created with the same name on different servers are later moved to the same server.
The basic problem is that there seem to be only two options for making an ontology IRI unique: (1) require people to register names with a central authority before they use them, or (2) include a very long random number. The first option seems too burdensome, and the second one seems too ugly.
@lrosenth @tobiasschweizer @subotic @loicjaouen what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By projectUUID
, I actually meant projectcode
, as until now we haven't mentionned projectUUID (at least I didn't understand it that way).
An ontology can exist independently of a project. There will be ontologies shared by multiple projects. So I don’t think we can include any sort of project ID (whether UUID, shortname, or project code) in an ontology IRI.
Yes, but if you are talking about the kind of standard ontologies shared by various projects, my guess is that these ontologies will undergo a review process. Their status will also change as they proceed from 'project-specific ontologies' to 'standard built-in ontologies'. In the process, aside from cleaning and improving these ontology, can't we imagine that the initial project-ID attached to the IRIs will be changed into a more generic name (such as bibliography, for example)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine a problem occurring if someday we do federated searches in which triplestores on different servers are talking to each other.
I don't know if we are ever going to support federated searches on the triplestore level. More likely on the federated webapi
server level. In those cases, the different hostnames will give us unique names.
Or if two ontologies created with the same name on different servers are later moved to the same server.
Maybe it would be easier to require a name change in those rare cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.
The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology. In the case of a project-specific ontology that became standardised, I think ideally there would be a project-specific ‘bridge’ ontology so the data wouldn’t have to change.
So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@subotic I think you’re probably right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrivoal A UUID is not something assigned manually, it’s a very long random number, which the Knora API server encodes as a 22-character string of letters and digits.
Yes, I figured it out :)
The problem with changing the IRI of an ontology is that you also have to change all the data that uses the ontology.
I am very aware of that, we have already faced the problem when we changed the syntax of our IRIs to include the projectcode instead of the projetshortname.
So... maybe including a short project ID (not a UUID) in the ontology IRI wouldn’t be such a bad idea. But it would mean that all existing ontologies and data would have to be updated.
Honestly, I thought this was the road we all agreed on several months ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I thought this was the road we all agreed on several months ago.
I thought we had only agreed that project IDs would be used in resource IRIs. I don’t remember anything about including them in ontology IRIs. But I’m not known for having the best memory. Anyway, we can certainly do it, but it will take some work to convert everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided that there would be project ID, and that IRIs would be http://rdfh.ch/PROJECT_ID
, we didn't specify clearly "resource" IRI and "project" IRI (c.f. minutes by mail from Lukas dated 2017-03-16).
So we apparently left room for misunderstandings.
As this was to avoid collisions, we thought it should be in the ontology too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to see if I can make this work in a backwards-compatible way, so existing data doesn’t have to be changed.
The formats of generated data IRIs for different types of objects are as follows: | ||
|
||
- Project: ``http://rdfh.ch/projects/PROJECT_UUID`` | ||
- Resource: ``http://rdfh.ch/PROJECT_ID/RESOURCE_UUID`` (where ``PROJECT_ID`` is not a random UUID, but the project's unique ID, which can be registered with the DaSCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One the line above, PROJECT_UUID
is mentionned whereas here, it is refered to as PROJECT_ID
. It should probably be unified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable. But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource ID has the project ID (should be project code, but is currently project shortname) in case someday we want resource IRIs to be directly dereferenceable.
Our resources IRIs (in Lausanne) are currently based on the project code (or project-ID, this four number chain of characters that we have discussed earlier), as I think was decided earlier this spring.
But again, how do we guarantee that project codes are unique? Will people have to register a project code with us even to create a test project? And if they don’t, suppose they then want to turn their test project into a real project, and they have already created resources with a test project code that isn’t unique?
Being in charge, the DaSCH would/could allocate project-IDs range to an institution, like we did already between Basel and Lausanne. Until another system like a registry of projects is set up, as mentionned by @lrosenth this spring as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mass renaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but this is open source software. Anyone can download and run it even if they aren’t (yet) a DaSCH satellite and don’t gave an allocated ID range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do the same as for IPs and DNS: have a domestic range of project IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that there is no software code, it is only convention that we set amongst ourselves, we can't allocate a range and put it as a default value in a config file.
Of course, we can explain it in the doc... but human understanding is error prone :)
I would suggest that each location will get a unique prefix which has to be used to form the project id. With this concept we would not have problems. Federated queries on RDF-level is on my wish list :-)
|
@lrosenth OK but the question is whether we include the project ID in the ontology IRI. This would mean changing all existing ontologies and data. |
So, if I start writing an ontology for
knowing that Does that sum up the conversation so far? |
I think it will be possible to have optional project IDs in ontology IRIs, so existing data doesn't have to be changed. Thus either of these would work as an internal ontology IRI:
I need to test this more to confirm. |
@benjamingeer, @lrosenth, @loicjaouen, @subotic, I think the discussion is getting really confused and we are not going anywhere. Could we consider a Skype to go through the problem together (again) and to set those IRIs once and for all? I mean, we have been asking for this and discussing it since march, I think it is high time we come to a conclusion and stick to it. But @benjamingeer, what you are suggesting right now doesn't fit with what we understood: |
@mrivoal |
You are right... But we did it anyway. |
Hmm, that will be a problem, because an ontology name is also used as a prefix label, which must be a valid XML NCName, which cannot start with a number. |
But we have defined corresponding prefixes like this: If we need to change things, we will. The most important is that we can agree on some schema and stick to it. Hence the suggestion to talk about it. Again. |
In API v2 (and even in API v1 for bulk import), the API server needs to be able to generate a prefix automatically from an ontology IRI. If we use project IDs as namespaces for ontology names, then both the project ID and the ontology name will need to be in the prefix. For example, given this ontology IRI:
The prefix could be |
…d XSD schema for bulk import.
@mrivoal @loicjaouen Lukas, Ivan, and I talked about this briefly. Lukas decided that in order to be able to progress more quickly right now, we will go ahead with the optional project ID in the ontology IRI. If your ontologies are working for you now with API v1, you don't need to change them now. When the time comes (but not before January), we will help you make any necessary changes to keep your data working with API v2. So, no meeting is necessary. :) |
Ok, then. Fine. But as we are working on new ontologies, we still need a detailled list of the correct expected syntaxes for
@benjamingeer, could you please review, update and correct @loicjaouen's summary in issue #632 (@loicjaouen did not include the |
@mrivoal More documentation is on its way. |
- Fix extra slash in ontology IRIs.
… into wip/ontology-updates-4
I have an additional couple of questions, @benjamingeer you previously said:
The nearly syllogism of the first and last sentences is: "an ontology can exist without a project so no ontology should be linked to a project". But there is a link between a project and a specific ontology, isn't it? In the admin configuration defining a project, here
And we load this incunabula in that named graph. We can still use this ontology in other projects. What does it change to have an Actually, wouldn't be usefull for projects to namely declare ontologies a project use, so the GUI can restrict the choices of resource types for that project and the user is not drowned under similar choices? As a user, if I want to add a book from a bibliographical ontology, I might have troubles to select which of the potentially many bibliographical ontology to choose from, and most likely, I might have trouble to consistently do the same choice. And later:
I don't get this one, as we can use any existing ontologies. |
|
That's the goal of #523. I think the API server should enforce this, so this scenario can't happen:
|
You are right, the server should enforce this. It would need to check every:
and make sure that only the project's ontologies or system ontologies are involved. |
Then there is no problem in having projects IDs or name in project specific ontologies. But that's not the right issue to discuss it. I think we just got caught in misunderstandings due to difference of contexts; us thinking in the "here and now" (#632) and you living already in the next iteration (#654) ;) Clearing these contexts is a matter of release management, something we should now formalize as we are in production and the next release will require a serious step to port our existing data. |
…onstructing class defs in OntologyResponderV2.
…ties in knora-api ontologies. - Add other missing cardinalities.
- Refactor ontology entity classes to remove unnecessary layer in class hierarchy.
…nfoContentV2, so they're available for updates.
…one schema to another, for use in updates too.
knora-api
andknora-base
(e.g.dc
) as built-in ontologies, not as project-specific ones.standoff
, but not yet the ones inknora-base
or standoff datatype tags (see Serve salsah-gui and standoff ontologies #648).Fixes dhlab-basel/Salsah#95.