Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: fatal error when SIP is a BagIt bag #10

Open
djjuhasz opened this issue May 30, 2024 · 6 comments
Open

Problem: fatal error when SIP is a BagIt bag #10

djjuhasz opened this issue May 30, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@djjuhasz
Copy link
Contributor

djjuhasz commented May 30, 2024

If the SIP delivered to Enduro is a BagIt bag the pre-processing workflow fails with a fatal error at the "CreateBagActivity":

CreateBagActivity: create bag: mkdir /home/preprocessing/shared/enduro3594443426/extract3168784964/ZippedBag/data: file exists

To Reproduce

Steps to reproduce the behavior:

  1. Run Enduro with preprocessing-base enabled for preprocessing
  2. Upload a zipped bag to the Enduro MinIO "sips" bucket
  3. The above error occurs

Expected behavior

In real world implementations of preprocessing the SIP delivered by Enduro will be modified before being bagged and sent back to Enduro in which case the bag will need to be updated or its contents "unbagged" to prevent errors validating the bag payload against its manifest.

If the "CreateBagActivity" receives a BagIt bag as input, it should return the path of the bag, without altering the bag. Preprocessing will then deliver the unaltered bag to Enduro for further processing.

Additional context

See artefactual-sdps/enduro#805 for more information about the exchange of bags between preprocessing and Enduro.

@djjuhasz djjuhasz added this to Enduro May 30, 2024
@djjuhasz djjuhasz moved this to 👍 Ready in Enduro May 30, 2024
@jraddaoui
Copy link
Contributor

I guess making the CreateBagActivity noop when it receives a Bag won't hurt, but I was thinking to address this issue just with documentation. I see this template repository as an example never to be run as part of a real workflow.

@djjuhasz
Copy link
Contributor Author

I tested https://github.com/LibraryOfCongress/bagit-python to see what it does when asked to bag a bag, and it just "double bags" the contents -- everything in the original bag (including manifests, metadata files, and the data directory) are put in a "data" directory and then it generates new manifests for everything. I don't know that I want to implement the same behaviour for the CreateBagActivity, but I think we should do better than the current error.

Another option is just to return a better error message like 'create bag: /path/to/dir is already a bag" or something similar.

@sallain sallain added the bug Something isn't working label Jun 6, 2024
@mcantelon mcantelon moved this from 👍 Ready to ⏳ In Progress in Enduro Jan 8, 2025
@mcantelon mcantelon self-assigned this Jan 8, 2025
@mcantelon
Copy link
Contributor

I took the route of doing a rudimentary check to see if a directory already appears to be a Bag and, if so, return it's source path as the Bag path.

PR for CR: artefactual-sdps/temporal-activities#43

@mcantelon
Copy link
Contributor

mcantelon commented Jan 19, 2025

New PR to update the preprocessing-base repo to use the updated activity:

#20

@mcantelon
Copy link
Contributor

PR merged.

@mcantelon mcantelon moved this from ⏳ In Progress to 🧐 QA in Enduro Jan 20, 2025
@mcantelon
Copy link
Contributor

I've merged a PR that updates the preprocessing-demo repo with this update as well (so when the demo test site is updated QA can be done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🧐 QA
Development

No branches or pull requests

4 participants