Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 route for text values #941

Closed
tobiasschweizer opened this issue Jul 16, 2018 · 30 comments · Fixed by #1322
Closed

v2 route for text values #941

tobiasschweizer opened this issue Jul 16, 2018 · 30 comments · Fixed by #1322
Assignees
Labels
API/V2 enhancement improve existing code or new feature
Milestone

Comments

@tobiasschweizer
Copy link
Contributor

Implement a v2 route to read text values:

  • whole text values identified by their IRI
  • partial text values identified by a standoff tag Iri
  • arbitrary ranges of text, given he index positions
  • support content negotiation: XML, HTML (XSL transformation), plain text without markup, Knora API standoff format in JSON-LD and text as string (could be similar to what we had in v1)
@tobiasschweizer tobiasschweizer self-assigned this Jul 16, 2018
@tobiasschweizer tobiasschweizer added enhancement improve existing code or new feature API/V2 labels Jul 16, 2018
@tobiasschweizer tobiasschweizer added this to the API V2 milestone Jul 16, 2018
@tobiasschweizer
Copy link
Contributor Author

It would be great if this could comply with existing or upcoming standards like http://commons.pelagios.org/2018/05/the-linked-texts-working-group

@tobiasschweizer
Copy link
Contributor Author

this is related to the way text search results have to be handled, see #913

@tobiasschweizer
Copy link
Contributor Author

tobiasschweizer commented Jul 16, 2018

If it is too hard to return parts of a text value in XML, we could just return the plain text and the markup in our API standoff format. The client could still try to display it if is capable to do so.

@benjamingeer
Copy link

I was actually wondering whether it really makes sense to have routes that read values instead of resources. A value doesn't have a stable IRI, so it doesn't makes sense to publish value IRIs.

@benjamingeer
Copy link

If you want the current value(s) of a property of a resource, you can always get it with Gravsearch.

@tobiasschweizer
Copy link
Contributor Author

I think it makes sense when you want to refer to a specific version of a value.

@benjamingeer
Copy link

Well, I was also thinking that if you could get a resource and (some of) its values as they existed at a particular time in the past, you would not need to publish value IRIs to refer to past versions of a value. You would just need the resource IRI and the timestamp. The advantages would be:

  • You would have a stable IRI.
  • You would get important metadata about the resource, providing context for the value. In contrast, if you only have the value IRI, there is no way to get to the resource that contains it.
  • The client would not need to know about value versions. They would only need to know about (virtual) resource versions, which would be simpler for clients to deal with.

@benjamingeer
Copy link

And the timestamp would make it clear that you're dealing with a version from the past. If you wanted the current version, you would just request the same resource IRI without the timestamp.

In contrast, if you have the IRI of an old version of a value, there's no way to get to the current version from there.

@tobiasschweizer
Copy link
Contributor Author

I think what you say makes sense when you are mainly interested in whole resources. But I can imagine a use case where someone wants to cite a specific version of a text value, maybe even a specific part of that text value. I guess you could also think of it as a nano publication.

If you could only obtain the whole resource, this might not be enough. Especially if you can have several instances for the same property (cardinality).

I could even imagine that we extend standoff links so they can also refer to values, not only to resources.

@benjamingeer
Copy link

I think what you say makes sense when you are mainly interested in whole resources. But I can imagine a use case where someone wants to cite a specific version of a text value, maybe even a specific part of that text value. I guess you could also think of it as a nano publication.

Like I said, if you wanted to cite a specific version, you could use the timestamp and the resource IRI.

If you only want the value(s) of a specific property, you can use Gravsearch for that.

If you want to cite a specific part of a text value, we have standoff UUIDs for that. But I think that when you use them, you should still get back the metadata of the resource, as well as the value that contains the standoff.

Basically I don't think values are meaningful outside the context of a resource. If you have a value "Euler", it only makes sense if you know that it's the object of the property familyName. Otherwise it's just a string of characters with no semantics.

Especially if you can have several instances for the same property (cardinality).

Do you have a real use case where you need both of the following?

  1. to have multiple values of the same property
  2. to be able to cite a specific version of a specific value of the property

@benjamingeer
Copy link

I could even imagine that we extend standoff links so they can also refer to values, not only to resources.

Why would you link to something that doesn't have a stable IRI, and will be marked as deleted as a soon as a new version is made of it? The only way you could actually use that link would be via a special route for viewing deleted data.

The design of versioning has always been like this: all the normal queries return only the current version of the data. If you want a past version, you have to use a special time-machine route, and submit a timestamp.

If you find yourself wanting to link to values, I think it means that you should instead design your ontology differently, so that the things you want to link to are actually resources.

@benjamingeer
Copy link

Especially if you can have several instances for the same property (cardinality).

I think the solution would be to have the property point to resources rather than to values. Then you could cite them.

@benjamingeer
Copy link

I would actually like to make a Knora policy like this:

  • Resources have stable IRIs. You can request a resource by its IRI, and request a version of its contents by specifying a timestamp.
  • Values don't have stable IRIs. You can request a value only as part of a resource. You can't request a value by its IRI. You can't link to a value.

Therefore if you want to be able to link to something, you have to make a resource class for it. That could be a class just containing a single property pointing to a single text value with cardinality 1. Then getting "the whole resource" would be equivalent to what you want to do.

@benjamingeer
Copy link

I think that in the past, there has been a tendency to make resource classes that have lots of values, because API v1 didn't return embedded resources. But API v2 makes it easy to get embedded resources. So it makes sense now to make resource classes that consist mainly of link properties, each of which points to a small resource containing perhaps only one value (e.g. a text). That makes it possible to link to the compound resource as well as to each of its components.

@benjamingeer
Copy link

Also, a client that uses the simple schema definitely can't link to values, because there are no value IRIs at all in the simple schema, only literals. And if you want to support standards, I think it's likely that they'll be based on that sort of model as well.

@tobiasschweizer
Copy link
Contributor Author

With the extension of our standoff model, the text value itself gets richer and this could mean that traditional metadata like the author or recipient of a letter could actually be referenced from inside the text value, using subclasses of StandofflinkTag so you could still do a structured search.

This means that conceptually a text value gets more and more interesting, also if it is looked at individually (without the resource whose property points to it).

What you have outlined above makes sense when you want to enable people to redo a Gravsearch query someone has made in the past, to get the same results also if the data has changed in the meantime.

@benjamingeer
Copy link

This means that conceptually a text value gets more and more interesting, also if it is looked at individually (without the resource whose property points to it).

Regardless of how interesting it is, it still has no stable IRI, which I think is extremely user-unfriendly. There are two possibilities, both of them bad in my view:

  1. You don't care about past versions (which is the normal case), you just want to refer to the text. In that case, the value IRI is worse than useless.
  2. You want to cite a particular past version. In that case, the value IRI is still user-unfriendly, because:
    a. The IRI is arbitrary. It doesn't give you any information about what version you're looking at.
    b. There is no way to navigate to more recent versions.
    c. There is no way to find out anything about the semantic context of the value, i.e. about the resource.

In short: I think that the design decision to use a new IRI for each value version inherently means that values are unciteable.

However, I think that knora-base already provides a simple solution to this problem: put your value in a resource, and cite the resource.

In other words, if your value is so interesting that it needs to be cited independently of the resource that contains it, no problem: just put it in a little resource all by itself, and cite the IRI of that resource.

In practice, though, I think this is what you are already doing. In BEOL, every letter is in its own resource. So I don't see how you would gain anything by being able to cite the text value rather than the beol:letter resource.

@benjamingeer
Copy link

With the extension of our standoff model, the text value itself gets richer and this could mean that traditional metadata like the author or recipient of a letter could actually be referenced from inside the text value, using subclasses of StandofflinkTag so you could still do a structured search.

Then make the author a resource and the recipient a resource. This is what you are doing already, isn't it?

@benjamingeer
Copy link

I also believe that Lukas already decided long ago that there would be permalinks (now ARK URLs) only for resources, not for values.

@benjamingeer
Copy link

And, once again, there are no value IRIs in the simple schema. You wrote:

It would be great if this could comply with existing or upcoming standards

I have yet to see an RDF standard that has any concept of value IRIs. That's why we made the simple schema. Conclusion: you cannot mix standards with value IRIs.

@benjamingeer
Copy link

And now for something hypothetical: suppose that values had stable IRIs. I would still be against serving them by themselves. This would mean that you could do a request for a value IRI and get:

{
    intValueHasInt: 3
}

The value would be served with no semantics whatsoever. I think that would make client-side bugs very likely. We're not building TCP/IP here, we're building Linked Open Data. I think we should never serve data without any semantics.

The smallest unit of data that's guaranteed to have semantics in Knora is a resource. So I think API v2 should never serve a value without enclosing it in a resource. (This doesn't mean serving the whole resource, just its metadata at least, i.e. its @id, @type, rdfs:label, etc.)

@benjamingeer
Copy link

So, to sum up what I tried to say better in person yesterday:

If value IRIs start getting published, I think these problems will happen a lot:

  • "There's a bug in Knora: I updated my value, but when I reload it in the browser, the content hasn't changed." (people will not expect value IRIs to refer to versions)
  • "You mean this is only the URL for a version of the text? Actually I need to publish a URL that will always show the latest version, what URL should I use?" (there is no such URL for a value)
  • "Someone emailed me this URL, but it just displays a text, with no other information. How do I find out more about this text?" (no link from value to resource)

Also, conversations with users about versioning in Knora always run into the same problem: nobody expects values to have versions, they expect resources to have versions. So I think API v2 should simulate resource versions using timestamps. We should just tell people:

  • Only publish resource permalinks (ARK URLs).
  • Don't publish Knora's internal IRIs.

Then they only really need to understand value versions if they're making a GUI for editing values.

To handle the use cases in the description of this issue, I suggest making routes that get:

  1. whole text values identified by their IRI whole text values embedded in their resource, identified by resource IRI and property IRI. This would just be a shorthand for a simple Gravserach template taking $resource and $property as parameters:
CONSTRUCT {
  ?resource knora-api:isMainResource true .
  ?resource <$property> ?value .
} WHERE {
  BIND(<$resource> AS ?resource>)
  ?resource <$property> ?value .
}

To get a particular version, you would supply a timestamp, which Gravsearch would take care of.

  1. partial text values identified by a standoff tag Iri partial text values identified by a standoff tag UUID. The route could just take the standoff UUID, but it would return the same thing as (1) above. To get a particular version, you would supply a timestamp.
  2. arbitrary ranges of text, given the index positions: this is fine, it would just require adding the indexes to the route in (1). The property would be required to have cardinality 0-1 or 1, otherwise you'd get an error.
  3. support content negotiation: XML, HTML (XSL transformation), plain text without markup, Knora API standoff format in JSON-LD and text as string (could be similar to what we had in v1): this is fine

@tobiasschweizer
Copy link
Contributor Author

The property would be required to have cardinality 0-1 or 1,

I think this requirement would have to be met in general if you want to get a specific value.

If there could possibly be more than one instance of a property for a resource, this would not work, and an error would have to be thrown as you say.

This has some effects on how you have to model your data if you want them to be citable. I think we should make this explicit in modeling recommendations.

@benjamingeer
Copy link

This has some effects on how you have to model your data if you want them to be citable. I think we should make this explicit in modeling recommendations.

Yep.

@benjamingeer
Copy link

If we made such a route (for getting a resource with just one of its values, whose property has to be specified and is required to have cardinality 1 or 0-1), we could even support ARK URLs for that route. The property IRI could be Base64-encoded in the ARK URL along with the resource IRI.

@tobiasschweizer
Copy link
Contributor Author

tobiasschweizer commented Jul 20, 2018

If we made such a route (for getting a resource with just one of its values, whose property has to be specified and is required to have cardinality 1 or 0-1), we could even support ARK URLs for that route. The property IRI could be Base64-encoded in the ARK URL along with the resource IRI.

That's an excellent idea!

But then we could not guarantee that you will get a value since the property could be optional. Depending on the timestamp, there could be such a value or not. For example, the resource could have been created without the optional property and then it could have been added later.

@benjamingeer
Copy link

Isn’t that the case with all citations? There's no way to guarantee that a citation refers to something that actually exists, because the citation could always be incorrect.

If you publish a link, it’s always your responsibility to ensure that it actually refers to something, that it isn’t a broken link. Salsah could help by generating the ARK URL only if such a value exists. It could ask you whether you want a citable link to the current version of the resource (i.e. a link with a timestamp) or a link to this and any future versions (i.e. without a timestamp).

@benjamingeer
Copy link

benjamingeer commented Apr 12, 2019

I've been thinking about this some more, and I have another idea.

The problem with my suggestion above is that sometimes it really makes sense for a property to have several text values. For example, there's an Incunabula book with three titles:

  • Reise ins Heilige Land
  • Reysen und wanderschafften durch das Gelobte Land
  • Itinerarius

What if you want to cite just one of these titles?

So here's my suggestion: we give every value a UUID. When a new version of the value is made, it keeps the same UUID as the previous version. Then we can make an ARK URL for each value: it would be the ARK URL for the resource, plus the value UUID, like this:

http://ark.dasch.swiss/ark:/72163/1/[projectID]/[resourceUUID]/[valueUUID]

This would redirect to a route like this:

GET /v2/values/[resourceIRI]/[valueUUID]

This would return the resource metadata and the value. This way:

  • You would always get information about the resource along with the value.
  • You would not need to know the value IRI.
  • By default, you would get the latest version of the value.
  • You could add a timestamp to get a past version of the value.

What do you think?

@tobiasschweizer
Copy link
Contributor Author

tobiasschweizer commented Apr 12, 2019

Actually after having seen your PR #1301 last night about encoding UUIDs, I also thought about using UUIDs for identifying values instead of value Iris. This would work for any value, not just text values, right?

I like the idea. The previous approach was too restrictive because it only worked if a value property had a cardinality of max. 1. Also it would have meant that the ontology design (as in the case of the incunabula example) could have prevented values from being cited.

With the timestamp, we can guarantee that the target can be found in its originally cited state.

It think we should implement this :-)

@benjamingeer
Copy link

This would work for any value, not just text values, right?

Yes.

It think we should implement this :-)

OK, I'll do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API/V2 enhancement improve existing code or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants