v2 route for text values #941

tobiasschweizer · 2018-07-16T09:14:59Z

Implement a v2 route to read text values:

whole text values identified by their IRI
partial text values identified by a standoff tag Iri
arbitrary ranges of text, given he index positions
support content negotiation: XML, HTML (XSL transformation), plain text without markup, Knora API standoff format in JSON-LD and text as string (could be similar to what we had in v1)

tobiasschweizer · 2018-07-16T09:17:15Z

It would be great if this could comply with existing or upcoming standards like http://commons.pelagios.org/2018/05/the-linked-texts-working-group

tobiasschweizer · 2018-07-16T09:21:44Z

this is related to the way text search results have to be handled, see #913

tobiasschweizer · 2018-07-16T09:25:21Z

If it is too hard to return parts of a text value in XML, we could just return the plain text and the markup in our API standoff format. The client could still try to display it if is capable to do so.

benjamingeer · 2018-07-16T12:21:11Z

I was actually wondering whether it really makes sense to have routes that read values instead of resources. A value doesn't have a stable IRI, so it doesn't makes sense to publish value IRIs.

benjamingeer · 2018-07-16T12:21:37Z

If you want the current value(s) of a property of a resource, you can always get it with Gravsearch.

tobiasschweizer · 2018-07-16T12:23:20Z

I think it makes sense when you want to refer to a specific version of a value.

benjamingeer · 2018-07-16T12:27:47Z

Well, I was also thinking that if you could get a resource and (some of) its values as they existed at a particular time in the past, you would not need to publish value IRIs to refer to past versions of a value. You would just need the resource IRI and the timestamp. The advantages would be:

You would have a stable IRI.
You would get important metadata about the resource, providing context for the value. In contrast, if you only have the value IRI, there is no way to get to the resource that contains it.
The client would not need to know about value versions. They would only need to know about (virtual) resource versions, which would be simpler for clients to deal with.

benjamingeer · 2018-07-16T12:30:20Z

And the timestamp would make it clear that you're dealing with a version from the past. If you wanted the current version, you would just request the same resource IRI without the timestamp.

In contrast, if you have the IRI of an old version of a value, there's no way to get to the current version from there.

tobiasschweizer · 2018-07-16T12:37:30Z

I think what you say makes sense when you are mainly interested in whole resources. But I can imagine a use case where someone wants to cite a specific version of a text value, maybe even a specific part of that text value. I guess you could also think of it as a nano publication.

If you could only obtain the whole resource, this might not be enough. Especially if you can have several instances for the same property (cardinality).

I could even imagine that we extend standoff links so they can also refer to values, not only to resources.

benjamingeer · 2018-07-16T12:45:32Z

I think what you say makes sense when you are mainly interested in whole resources. But I can imagine a use case where someone wants to cite a specific version of a text value, maybe even a specific part of that text value. I guess you could also think of it as a nano publication.

Like I said, if you wanted to cite a specific version, you could use the timestamp and the resource IRI.

If you only want the value(s) of a specific property, you can use Gravsearch for that.

If you want to cite a specific part of a text value, we have standoff UUIDs for that. But I think that when you use them, you should still get back the metadata of the resource, as well as the value that contains the standoff.

Basically I don't think values are meaningful outside the context of a resource. If you have a value "Euler", it only makes sense if you know that it's the object of the property familyName. Otherwise it's just a string of characters with no semantics.

Especially if you can have several instances for the same property (cardinality).

Do you have a real use case where you need both of the following?

to have multiple values of the same property
to be able to cite a specific version of a specific value of the property

benjamingeer · 2018-07-16T12:56:10Z

I could even imagine that we extend standoff links so they can also refer to values, not only to resources.

Why would you link to something that doesn't have a stable IRI, and will be marked as deleted as a soon as a new version is made of it? The only way you could actually use that link would be via a special route for viewing deleted data.

The design of versioning has always been like this: all the normal queries return only the current version of the data. If you want a past version, you have to use a special time-machine route, and submit a timestamp.

If you find yourself wanting to link to values, I think it means that you should instead design your ontology differently, so that the things you want to link to are actually resources.

benjamingeer · 2018-07-16T12:58:52Z

Especially if you can have several instances for the same property (cardinality).

I think the solution would be to have the property point to resources rather than to values. Then you could cite them.

benjamingeer · 2018-07-16T13:06:09Z

I would actually like to make a Knora policy like this:

Resources have stable IRIs. You can request a resource by its IRI, and request a version of its contents by specifying a timestamp.
Values don't have stable IRIs. You can request a value only as part of a resource. You can't request a value by its IRI. You can't link to a value.

Therefore if you want to be able to link to something, you have to make a resource class for it. That could be a class just containing a single property pointing to a single text value with cardinality 1. Then getting "the whole resource" would be equivalent to what you want to do.

benjamingeer · 2018-07-16T13:10:05Z

I think that in the past, there has been a tendency to make resource classes that have lots of values, because API v1 didn't return embedded resources. But API v2 makes it easy to get embedded resources. So it makes sense now to make resource classes that consist mainly of link properties, each of which points to a small resource containing perhaps only one value (e.g. a text). That makes it possible to link to the compound resource as well as to each of its components.

benjamingeer · 2018-07-16T13:18:44Z

Also, a client that uses the simple schema definitely can't link to values, because there are no value IRIs at all in the simple schema, only literals. And if you want to support standards, I think it's likely that they'll be based on that sort of model as well.

tobiasschweizer · 2018-07-16T14:59:36Z

With the extension of our standoff model, the text value itself gets richer and this could mean that traditional metadata like the author or recipient of a letter could actually be referenced from inside the text value, using subclasses of StandofflinkTag so you could still do a structured search.

This means that conceptually a text value gets more and more interesting, also if it is looked at individually (without the resource whose property points to it).

What you have outlined above makes sense when you want to enable people to redo a Gravsearch query someone has made in the past, to get the same results also if the data has changed in the meantime.

benjamingeer · 2018-07-16T15:08:38Z

This means that conceptually a text value gets more and more interesting, also if it is looked at individually (without the resource whose property points to it).

Regardless of how interesting it is, it still has no stable IRI, which I think is extremely user-unfriendly. There are two possibilities, both of them bad in my view:

You don't care about past versions (which is the normal case), you just want to refer to the text. In that case, the value IRI is worse than useless.
You want to cite a particular past version. In that case, the value IRI is still user-unfriendly, because:
a. The IRI is arbitrary. It doesn't give you any information about what version you're looking at.
b. There is no way to navigate to more recent versions.
c. There is no way to find out anything about the semantic context of the value, i.e. about the resource.

In short: I think that the design decision to use a new IRI for each value version inherently means that values are unciteable.

However, I think that knora-base already provides a simple solution to this problem: put your value in a resource, and cite the resource.

In other words, if your value is so interesting that it needs to be cited independently of the resource that contains it, no problem: just put it in a little resource all by itself, and cite the IRI of that resource.

In practice, though, I think this is what you are already doing. In BEOL, every letter is in its own resource. So I don't see how you would gain anything by being able to cite the text value rather than the beol:letter resource.

benjamingeer · 2018-07-16T15:12:45Z

With the extension of our standoff model, the text value itself gets richer and this could mean that traditional metadata like the author or recipient of a letter could actually be referenced from inside the text value, using subclasses of StandofflinkTag so you could still do a structured search.

Then make the author a resource and the recipient a resource. This is what you are doing already, isn't it?

benjamingeer · 2018-07-16T15:16:34Z

I also believe that Lukas already decided long ago that there would be permalinks (now ARK URLs) only for resources, not for values.

benjamingeer · 2018-07-16T15:36:57Z

And, once again, there are no value IRIs in the simple schema. You wrote:

It would be great if this could comply with existing or upcoming standards

I have yet to see an RDF standard that has any concept of value IRIs. That's why we made the simple schema. Conclusion: you cannot mix standards with value IRIs.

benjamingeer · 2018-07-16T16:23:35Z

And now for something hypothetical: suppose that values had stable IRIs. I would still be against serving them by themselves. This would mean that you could do a request for a value IRI and get:

{
    intValueHasInt: 3
}

The value would be served with no semantics whatsoever. I think that would make client-side bugs very likely. We're not building TCP/IP here, we're building Linked Open Data. I think we should never serve data without any semantics.

The smallest unit of data that's guaranteed to have semantics in Knora is a resource. So I think API v2 should never serve a value without enclosing it in a resource. (This doesn't mean serving the whole resource, just its metadata at least, i.e. its @id, @type, rdfs:label, etc.)

benjamingeer · 2018-07-18T07:14:06Z

So, to sum up what I tried to say better in person yesterday:

If value IRIs start getting published, I think these problems will happen a lot:

"There's a bug in Knora: I updated my value, but when I reload it in the browser, the content hasn't changed." (people will not expect value IRIs to refer to versions)
"You mean this is only the URL for a version of the text? Actually I need to publish a URL that will always show the latest version, what URL should I use?" (there is no such URL for a value)
"Someone emailed me this URL, but it just displays a text, with no other information. How do I find out more about this text?" (no link from value to resource)

Also, conversations with users about versioning in Knora always run into the same problem: nobody expects values to have versions, they expect resources to have versions. So I think API v2 should simulate resource versions using timestamps. We should just tell people:

Only publish resource permalinks (ARK URLs).
Don't publish Knora's internal IRIs.

Then they only really need to understand value versions if they're making a GUI for editing values.

To handle the use cases in the description of this issue, I suggest making routes that get:

~~whole text values identified by their IRI~~ whole text values embedded in their resource, identified by resource IRI and property IRI. This would just be a shorthand for a simple Gravserach template taking $resource and $property as parameters:

CONSTRUCT {
  ?resource knora-api:isMainResource true .
  ?resource <$property> ?value .
} WHERE {
  BIND(<$resource> AS ?resource>)
  ?resource <$property> ?value .
}

To get a particular version, you would supply a timestamp, which Gravsearch would take care of.

~~partial text values identified by a standoff tag Iri~~ partial text values identified by a standoff tag UUID. The route could just take the standoff UUID, but it would return the same thing as (1) above. To get a particular version, you would supply a timestamp.
arbitrary ranges of text, given the index positions: this is fine, it would just require adding the indexes to the route in (1). The property would be required to have cardinality 0-1 or 1, otherwise you'd get an error.
support content negotiation: XML, HTML (XSL transformation), plain text without markup, Knora API standoff format in JSON-LD and text as string (could be similar to what we had in v1): this is fine

tobiasschweizer · 2018-07-20T08:13:04Z

The property would be required to have cardinality 0-1 or 1,

I think this requirement would have to be met in general if you want to get a specific value.

If there could possibly be more than one instance of a property for a resource, this would not work, and an error would have to be thrown as you say.

This has some effects on how you have to model your data if you want them to be citable. I think we should make this explicit in modeling recommendations.

benjamingeer · 2018-07-20T09:18:31Z

This has some effects on how you have to model your data if you want them to be citable. I think we should make this explicit in modeling recommendations.

Yep.

benjamingeer · 2018-07-20T09:23:21Z

If we made such a route (for getting a resource with just one of its values, whose property has to be specified and is required to have cardinality 1 or 0-1), we could even support ARK URLs for that route. The property IRI could be Base64-encoded in the ARK URL along with the resource IRI.

tobiasschweizer · 2018-07-20T10:05:49Z

If we made such a route (for getting a resource with just one of its values, whose property has to be specified and is required to have cardinality 1 or 0-1), we could even support ARK URLs for that route. The property IRI could be Base64-encoded in the ARK URL along with the resource IRI.

That's an excellent idea!

But then we could not guarantee that you will get a value since the property could be optional. Depending on the timestamp, there could be such a value or not. For example, the resource could have been created without the optional property and then it could have been added later.

benjamingeer · 2018-07-20T10:24:19Z

Isn’t that the case with all citations? There's no way to guarantee that a citation refers to something that actually exists, because the citation could always be incorrect.

If you publish a link, it’s always your responsibility to ensure that it actually refers to something, that it isn’t a broken link. Salsah could help by generating the ARK URL only if such a value exists. It could ask you whether you want a citable link to the current version of the resource (i.e. a link with a timestamp) or a link to this and any future versions (i.e. without a timestamp).

benjamingeer · 2019-04-12T06:56:30Z

I've been thinking about this some more, and I have another idea.

The problem with my suggestion above is that sometimes it really makes sense for a property to have several text values. For example, there's an Incunabula book with three titles:

Reise ins Heilige Land
Reysen und wanderschafften durch das Gelobte Land
Itinerarius

What if you want to cite just one of these titles?

So here's my suggestion: we give every value a UUID. When a new version of the value is made, it keeps the same UUID as the previous version. Then we can make an ARK URL for each value: it would be the ARK URL for the resource, plus the value UUID, like this:

http://ark.dasch.swiss/ark:/72163/1/[projectID]/[resourceUUID]/[valueUUID]

This would redirect to a route like this:

GET /v2/values/[resourceIRI]/[valueUUID]

This would return the resource metadata and the value. This way:

You would always get information about the resource along with the value.
You would not need to know the value IRI.
By default, you would get the latest version of the value.
You could add a timestamp to get a past version of the value.

What do you think?

tobiasschweizer · 2019-04-12T07:07:54Z

Actually after having seen your PR #1301 last night about encoding UUIDs, I also thought about using UUIDs for identifying values instead of value Iris. This would work for any value, not just text values, right?

I like the idea. The previous approach was too restrictive because it only worked if a value property had a cardinality of max. 1. Also it would have meant that the ontology design (as in the case of the incunabula example) could have prevented values from being cited.

With the timestamp, we can guarantee that the target can be found in its originally cited state.

It think we should implement this :-)

benjamingeer · 2019-04-12T07:13:18Z

This would work for any value, not just text values, right?

Yes.

It think we should implement this :-)

OK, I'll do it.

tobiasschweizer self-assigned this Jul 16, 2018

tobiasschweizer added enhancement improve existing code or new feature API/V2 labels Jul 16, 2018

tobiasschweizer added this to the API V2 milestone Jul 16, 2018

benjamingeer mentioned this issue Apr 12, 2019

Ben's PR history #571

Open

benjamingeer mentioned this issue May 17, 2019

feat(api-v2): Make values citable #1322

Merged

13 tasks

benjamingeer closed this as completed in #1322 Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 route for text values #941

v2 route for text values #941

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018 •

edited

Loading

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 18, 2018

tobiasschweizer commented Jul 20, 2018

benjamingeer commented Jul 20, 2018

benjamingeer commented Jul 20, 2018

tobiasschweizer commented Jul 20, 2018 •

edited

Loading

benjamingeer commented Jul 20, 2018

benjamingeer commented Apr 12, 2019 •

edited

Loading

tobiasschweizer commented Apr 12, 2019 •

edited

Loading

benjamingeer commented Apr 12, 2019

v2 route for text values #941

v2 route for text values #941

Comments

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018 • edited Loading

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

tobiasschweizer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 16, 2018

benjamingeer commented Jul 18, 2018

tobiasschweizer commented Jul 20, 2018

benjamingeer commented Jul 20, 2018

benjamingeer commented Jul 20, 2018

tobiasschweizer commented Jul 20, 2018 • edited Loading

benjamingeer commented Jul 20, 2018

benjamingeer commented Apr 12, 2019 • edited Loading

tobiasschweizer commented Apr 12, 2019 • edited Loading

benjamingeer commented Apr 12, 2019

tobiasschweizer commented Jul 16, 2018 •

edited

Loading

tobiasschweizer commented Jul 20, 2018 •

edited

Loading

benjamingeer commented Apr 12, 2019 •

edited

Loading

tobiasschweizer commented Apr 12, 2019 •

edited

Loading