-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 route for text values #941
Comments
It would be great if this could comply with existing or upcoming standards like http://commons.pelagios.org/2018/05/the-linked-texts-working-group |
this is related to the way text search results have to be handled, see #913 |
If it is too hard to return parts of a text value in XML, we could just return the plain text and the markup in our API standoff format. The client could still try to display it if is capable to do so. |
I was actually wondering whether it really makes sense to have routes that read values instead of resources. A value doesn't have a stable IRI, so it doesn't makes sense to publish value IRIs. |
If you want the current value(s) of a property of a resource, you can always get it with Gravsearch. |
I think it makes sense when you want to refer to a specific version of a value. |
Well, I was also thinking that if you could get a resource and (some of) its values as they existed at a particular time in the past, you would not need to publish value IRIs to refer to past versions of a value. You would just need the resource IRI and the timestamp. The advantages would be:
|
And the timestamp would make it clear that you're dealing with a version from the past. If you wanted the current version, you would just request the same resource IRI without the timestamp. In contrast, if you have the IRI of an old version of a value, there's no way to get to the current version from there. |
I think what you say makes sense when you are mainly interested in whole resources. But I can imagine a use case where someone wants to cite a specific version of a text value, maybe even a specific part of that text value. I guess you could also think of it as a nano publication. If you could only obtain the whole resource, this might not be enough. Especially if you can have several instances for the same property (cardinality). I could even imagine that we extend standoff links so they can also refer to values, not only to resources. |
Like I said, if you wanted to cite a specific version, you could use the timestamp and the resource IRI. If you only want the value(s) of a specific property, you can use Gravsearch for that. If you want to cite a specific part of a text value, we have standoff UUIDs for that. But I think that when you use them, you should still get back the metadata of the resource, as well as the value that contains the standoff. Basically I don't think values are meaningful outside the context of a resource. If you have a value "Euler", it only makes sense if you know that it's the object of the property
Do you have a real use case where you need both of the following?
|
Why would you link to something that doesn't have a stable IRI, and will be marked as deleted as a soon as a new version is made of it? The only way you could actually use that link would be via a special route for viewing deleted data. The design of versioning has always been like this: all the normal queries return only the current version of the data. If you want a past version, you have to use a special time-machine route, and submit a timestamp. If you find yourself wanting to link to values, I think it means that you should instead design your ontology differently, so that the things you want to link to are actually resources. |
I think the solution would be to have the property point to resources rather than to values. Then you could cite them. |
I would actually like to make a Knora policy like this:
Therefore if you want to be able to link to something, you have to make a resource class for it. That could be a class just containing a single property pointing to a single text value with |
I think that in the past, there has been a tendency to make resource classes that have lots of values, because API v1 didn't return embedded resources. But API v2 makes it easy to get embedded resources. So it makes sense now to make resource classes that consist mainly of link properties, each of which points to a small resource containing perhaps only one value (e.g. a text). That makes it possible to link to the compound resource as well as to each of its components. |
Also, a client that uses the simple schema definitely can't link to values, because there are no value IRIs at all in the simple schema, only literals. And if you want to support standards, I think it's likely that they'll be based on that sort of model as well. |
With the extension of our standoff model, the text value itself gets richer and this could mean that traditional metadata like the author or recipient of a letter could actually be referenced from inside the text value, using subclasses of StandofflinkTag so you could still do a structured search. This means that conceptually a text value gets more and more interesting, also if it is looked at individually (without the resource whose property points to it). What you have outlined above makes sense when you want to enable people to redo a Gravsearch query someone has made in the past, to get the same results also if the data has changed in the meantime. |
Regardless of how interesting it is, it still has no stable IRI, which I think is extremely user-unfriendly. There are two possibilities, both of them bad in my view:
In short: I think that the design decision to use a new IRI for each value version inherently means that values are unciteable. However, I think that In other words, if your value is so interesting that it needs to be cited independently of the resource that contains it, no problem: just put it in a little resource all by itself, and cite the IRI of that resource. In practice, though, I think this is what you are already doing. In BEOL, every letter is in its own resource. So I don't see how you would gain anything by being able to cite the text value rather than the |
Then make the author a resource and the recipient a resource. This is what you are doing already, isn't it? |
I also believe that Lukas already decided long ago that there would be permalinks (now ARK URLs) only for resources, not for values. |
And, once again, there are no value IRIs in the simple schema. You wrote:
I have yet to see an RDF standard that has any concept of value IRIs. That's why we made the simple schema. Conclusion: you cannot mix standards with value IRIs. |
And now for something hypothetical: suppose that values had stable IRIs. I would still be against serving them by themselves. This would mean that you could do a request for a value IRI and get: {
intValueHasInt: 3
} The value would be served with no semantics whatsoever. I think that would make client-side bugs very likely. We're not building TCP/IP here, we're building Linked Open Data. I think we should never serve data without any semantics. The smallest unit of data that's guaranteed to have semantics in Knora is a resource. So I think API v2 should never serve a value without enclosing it in a resource. (This doesn't mean serving the whole resource, just its metadata at least, i.e. its |
So, to sum up what I tried to say better in person yesterday: If value IRIs start getting published, I think these problems will happen a lot:
Also, conversations with users about versioning in Knora always run into the same problem: nobody expects values to have versions, they expect resources to have versions. So I think API v2 should simulate resource versions using timestamps. We should just tell people:
Then they only really need to understand value versions if they're making a GUI for editing values. To handle the use cases in the description of this issue, I suggest making routes that get:
CONSTRUCT {
?resource knora-api:isMainResource true .
?resource <$property> ?value .
} WHERE {
BIND(<$resource> AS ?resource>)
?resource <$property> ?value .
} To get a particular version, you would supply a timestamp, which Gravsearch would take care of.
|
I think this requirement would have to be met in general if you want to get a specific value. If there could possibly be more than one instance of a property for a resource, this would not work, and an error would have to be thrown as you say. This has some effects on how you have to model your data if you want them to be citable. I think we should make this explicit in modeling recommendations. |
Yep. |
If we made such a route (for getting a resource with just one of its values, whose property has to be specified and is required to have cardinality 1 or 0-1), we could even support ARK URLs for that route. The property IRI could be Base64-encoded in the ARK URL along with the resource IRI. |
That's an excellent idea! But then we could not guarantee that you will get a value since the property could be optional. Depending on the timestamp, there could be such a value or not. For example, the resource could have been created without the optional property and then it could have been added later. |
Isn’t that the case with all citations? There's no way to guarantee that a citation refers to something that actually exists, because the citation could always be incorrect. If you publish a link, it’s always your responsibility to ensure that it actually refers to something, that it isn’t a broken link. Salsah could help by generating the ARK URL only if such a value exists. It could ask you whether you want a citable link to the current version of the resource (i.e. a link with a timestamp) or a link to this and any future versions (i.e. without a timestamp). |
I've been thinking about this some more, and I have another idea. The problem with my suggestion above is that sometimes it really makes sense for a property to have several text values. For example, there's an Incunabula book with three titles:
What if you want to cite just one of these titles? So here's my suggestion: we give every value a UUID. When a new version of the value is made, it keeps the same UUID as the previous version. Then we can make an ARK URL for each value: it would be the ARK URL for the resource, plus the value UUID, like this:
This would redirect to a route like this:
This would return the resource metadata and the value. This way:
What do you think? |
Actually after having seen your PR #1301 last night about encoding UUIDs, I also thought about using UUIDs for identifying values instead of value Iris. This would work for any value, not just text values, right? I like the idea. The previous approach was too restrictive because it only worked if a value property had a cardinality of max. 1. Also it would have meant that the ontology design (as in the case of the incunabula example) could have prevented values from being cited. With the timestamp, we can guarantee that the target can be found in its originally cited state. It think we should implement this :-) |
Yes.
OK, I'll do it. |
Implement a v2 route to read text values:
The text was updated successfully, but these errors were encountered: