Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api-v2): Erase a resource #1429

Merged
merged 8 commits into from
Sep 11, 2019
Merged

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Sep 4, 2019

This PR introduces a route /v2/resources/erase, for erasing a resource from the triplestore.

  • Add route.
  • Add implementation in ResourcesResponderV2.
  • Add tests in ResourcesResponderV2Spec.
  • Add test in ResourcesRouteV2E2ESpec.
  • Add docs.

Resolves #1421.

@benjamingeer benjamingeer self-assigned this Sep 4, 2019
@benjamingeer benjamingeer mentioned this pull request Sep 4, 2019
@benjamingeer benjamingeer added this to the 2019-09 milestone Sep 9, 2019
@benjamingeer benjamingeer added API/V2 enhancement improve existing code or new feature labels Sep 9, 2019
@tobiasschweizer
Copy link
Contributor

Have you included https://en.wikipedia.org/wiki/Eraser_(film) in the docs? :-)

@benjamingeer
Copy link
Author

@tobiasschweizer Sorry, I only include references to films I've seen. :)

@tobiasschweizer
Copy link
Contributor

What happens if the resource (e..g, a StillImageRepresentation) has an attached file that is stored in Sipi?

@benjamingeer
Copy link
Author

What happens if the resource (e..g, a StillImageRepresentation) has an attached file that is stored in Sipi?

Currently Knora never deletes files from Sipi, and I think this is still the safest policy, because of use cases like #1078.

@benjamingeer
Copy link
Author

(from Sipi's permanent storage I mean)

@@ -956,6 +956,38 @@ class ResourcesRouteV2E2ESpec extends E2ESpec(ResourcesRouteV2E2ESpec.config) {
)
}
}

"erase a resource" in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test attempting to erase a resource that is referred to to make sure the request is rejected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Sorry, I missed this one.

* @param ignoreKnoraConstraints if true, knora-base:subjectClassConstraint and knora-base:objectClassConstraint will be ignored.
* This is necessary when modifying the cardinalities of a class.
* @param ignoreRdfSubjectAndObject if true, rdf:subject and rdf:object will be ignored. This is necessary when checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this ignores reifications (link values) referring to the IRI of the resource to be deleted if set to true?

Wouldn't this mean that a resource is referenced and hence cannot be deleted?

Copy link
Contributor

@tobiasschweizer tobiasschweizer Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the IRI can be used as an rdf:subject (outgoing link) but not as an rdf:object (incoming link) (https://docs.knora.org/paradox/02-knora-ontologies/knora-base.html#linkvalue).

Copy link
Author

@benjamingeer benjamingeer Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we want to erase resource A, and resource B has a link value that points to A, and the link value is marked as deleted. The link value will still contain an rdf:object pointing to A. It's OK to ignore it, because B no longer refers to A.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the link value would have an rdf:object with the IRI of A that does not occur anymore as a subject in the triplestore, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobiasschweizer

But the link value would have an rdf:object with the IRI of A that does not occur anymore as a subject in the triplestore, right?

That's correct. This wouldn't be a problem for Knora or for the GraphDB consistency checking. @lrosenth are you saying that these (marked-as-deleted) references would need to be erased for legal reasons?

@tobiasschweizer
Copy link
Contributor

Since this is a killer feature in the proper sense, I would like to discuss this together with @benjamingeer and @lrosenth.

@subotic
Copy link
Collaborator

subotic commented Sep 11, 2019

We could always create placeholder resources for deleted resources with incoming links.

@benjamingeer
Copy link
Author

We could always create placeholder resources for deleted resources with incoming links.

The use case that @lrosenth had in mind is that he's creating a new project by doing an import, and he wants to remove some incorrect resources and import them again.

@subotic
Copy link
Collaborator

subotic commented Sep 11, 2019

I know, but what if project A creates a resource and project B links to it. So now project A wants to delete the resource. Are we not going to allow this? The data belongs to project A and they should be able to do what they want IMHO.

@benjamingeer
Copy link
Author

I know, but what if project A creates a resource and project B links to it. So now project A wants to delete the resource. Are we not going to allow this? The data belongs to project A and they should be able to do what they want IMHO.

Currently project A can mark the resource as deleted, but they can't erase it. I guess in this case, if they want to erase it, it would make sense to replace it with a dummy resource. @lrosenth ?

@benjamingeer
Copy link
Author

Although the problem would be that the placeholder resource would have to be of the correct class, which could have required cardinalities for links to other resources. So we could end up having to create a whole graph of dummy resources just to erase one resource. This is the same problem as the bulk import problem.

@benjamingeer
Copy link
Author

Since erasing resources in production is supposed to be something very rare, and can only be done by a sysadmin or project admin, maybe it would be best to leave it up to the admin to decide how to deal with this scenario. If they want to create a graph of dummy resources and change existing links first before erasing, they could do that via the API.

@lrosenth
Copy link
Contributor

OK – I would suggest the following rules:

  1. if a resource is not referenced by another resource: erase it completely (including associated SIPI-Files!)

  2. if the resource is referenced by another resource: You cannot erase it at all! The user has first to remove the links to it (that is, the links from other resources to the one to be deleted have to be marked as deleted).

  3. if a resource has only "incoming" links marked as deleted to it, these (already marked as deleted) links have to be removed/erased also completely. If it's simpler to "downgrad" the resource to a "dummy resource" that sais "I?ve been deleted" that would be a solution too – whatever is easier to implement!

Case (3) is not as urgent as (1) and (2)

Erasing a resource is always a "last resort action" that needs admin privileges

@benjamingeer
Copy link
Author

@lrosenth

  1. if a resource is not referenced by another resource: erase it completely (including associated SIPI-Files!)

We have a use case, #1078, where the same Sipi files could be reused in different resources: "Use case: bulk import has been done, and needs to be redone because data has changed, but images have not changed."

That seems similar to your use case here. Then you would want to keep the Sipi files, right?

Also, when you mark a resource or value as deleted, Knora doesn't remove its Sipi files. Should it?

  1. if a resource has only "incoming" links marked as deleted to it, these (already marked as deleted) links have to be removed/erased also completely.

That seems feasible.

@tobiasschweizer
Copy link
Contributor

Currently Knora never deletes files from Sipi, and I think this is still the safest policy, because of use cases like #1078.

Once a resource with an attached file value gets erased/deleted, will the file in Sipi then be unaccessible because no permissions will be granted for an erased/deleted resource?

In other words, the file would still be in Sipi, but it would never be served.

@benjamingeer
Copy link
Author

In other words, the file would still be in Sipi, but it would never be served.

It would never be served, but it could be used again to create another resource or value, which is the use case in #1078:

Use case: bulk import has been done, and needs to be redone because data has changed, but images have not changed.

@tobiasschweizer
Copy link
Contributor

tobiasschweizer commented Sep 11, 2019

It would never be served, but it could be used again to create another resource or value, which is the use case in #1078:

Which permissions would be required to use an existing file? If the original resource was erased, the original permissions cannot be determined. I am thinking of a copyright protected information that required the erasure of a resource.

@benjamingeer
Copy link
Author

Which permissions would be required to use an existing file? If the original resource was erased, the original permissions cannot be determined. I am thinking of a copyright protected information that required the erasure of a resource.

To reuse an existing file, you'd have to move it to Sipi's tmp directory first. Then you can create a new resource as if you had just uploaded the file to Sipi. No permissions will be checked.

If you don't want to reuse the file, you can delete it from Sipi's storage using rm.

@tobiasschweizer
Copy link
Contributor

To reuse an existing file, you'd have to move it to Sipi's tmp directory first. Then you can create a new resource as if you had just uploaded the file to Sipi. No permissions will be checked.

If you don't want to reuse the file, you can delete it from Sipi's storage using rm.

So this would require write access to Sipi's file system. I think that's fine since you would need admin privileges to use the functionality added with this PR.

@tobiasschweizer
Copy link
Contributor

This PR looks fine to me. I leave the final approval to @lrosenth

@benjamingeer
Copy link
Author

This PR looks fine to me. I leave the final approval to @lrosenth

Thanks. I'm currently writing the code to erase marked-as-deleted link values.

@tobiasschweizer
Copy link
Contributor

You are the Eraser ...

@benjamingeer
Copy link
Author

@lrosenth Is this OK now? It erases deleted link values pointing to the resource that's being erased. For the reasons explained above, it doesn't touch files in Sipi. Does that seem reasonable to you?

@lrosenth
Copy link
Contributor

Seems very reasonable to me! Maybe at some point we need a "Purge" algorithm whoiich detect unreferenced SIPI files and allows the to be deleted... But that's not urgent!

Copy link
Contributor

@lrosenth lrosenth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@benjamingeer
Copy link
Author

Thanks for the review!

@benjamingeer benjamingeer merged commit fb7adba into develop Sep 11, 2019
@benjamingeer benjamingeer deleted the wip/1421-erase-resource branch September 11, 2019 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API/V2 enhancement improve existing code or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Physically deleting a resource
4 participants