Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make object ID accessible from a small file without having to parse inventory #579

Closed
ptsefton opened this issue Mar 11, 2022 · 10 comments
Closed
Labels
Extensions Tickets that we believe should be extensions

Comments

@ptsefton
Copy link

ptsefton commented Mar 11, 2022

I thought this had been discussed before but I can’t find an issue.

It would be good to be able to find an OCFL Object’s ID without having to load the inventory, which could be an expensive operation. For example in writing a library you might want to be able to return a list of Object IDs so they can be consumed, eg by an indexer. To get the ID, now though you have to parse the inventory, pass the ID to another process which then also has to parse the inventory to use the object.

Could we just have an id file with string in it (or maybe other useful metadata without the potentially large parts).

@pwinckles
Copy link

I agree that it would be nice to have a less expensive way to get an object's id. I currently use a regex to get this in rocfl so I don't have to parse the entire inventory when I just want the id.

@zimeon
Copy link
Contributor

zimeon commented Mar 11, 2022

If we were to follow on from the use of NAMASTE to specify the type of an object (Conformance Declaration, 0=ocfl_object_1.0) then we could use that again to allow the identifier in a 4=identitifer_here NAMASTE file. See https://confluence.ucop.edu/download/attachments/14254149/NamasteSpec.pdf . I think I'd lean toward it being optional because depending on the storage approach an extra file for easy access to the id may or may not be considered a worthwhile optimization.

@ptsefton
Copy link
Author

@zimeon if you use a NAMASTE file then the identifier-here part would be problematic as it would need to be encoded, and might run into filename limits etc. I think it would be more practical to break out the fixed-size metadata in the inventory from the manifest and version stuff which is potentially quite large. Something like metadata.json.

We have discussed a short-term solution to this, storing an id.json or metadata.json file in the logs directory pending a decision for the OCFL object spec itself.

@pwinckles
Copy link

Yes, given that object ids should be URIs, they would need to be encoded if you wanted to use namaste.

@ptsefton It might be more fitting to use an extension in the short-term.

@ptsefton
Copy link
Author

@pwinckles Can an extension do things like add an extra file to the object root?

Spec says "The OCFL Object Root must not contain files or directories other than those specified in the following sections."

@pwinckles
Copy link

No, you'd just do like you described with the logs dir. So, you'd write the file to extensions/NNNN-object-meta/metadata.json, or whatever you want to call it. The advantage is that it would be formalized and generally usable in 1.0. Whereas the logs dir solution can't really be used by anyone else.

@ptsefton
Copy link
Author

Just to clarify, in an object the path to the new metadata file would be ./logs/extensions/NNNN-object-meta/metadata.json relative to the object root and we would document the extension the repository extensions directory?

@pwinckles
Copy link

No, I was suggesting putting it in the objects extension directory, so it would be something like ./extensions/NNNN-object-meta/metadata.json.

@neilsjefferies neilsjefferies added the Extensions Tickets that we believe should be extensions label Feb 2, 2023
@neilsjefferies
Copy link
Member

neilsjefferies commented Feb 2, 2023

There are a variety of ways of approaching this problem. A lot involve some form of caching with no intrinsic OCFL spec changes. Therefore we think this is best kept as an optional extension.

@zimeon
Copy link
Contributor

zimeon commented Sep 22, 2023

Editors' discussion 2023-09-22: Per #579 (comment) we think this is best addressed by either application caching or and extension

@zimeon zimeon closed this as completed Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Extensions Tickets that we believe should be extensions
Projects
None yet
Development

No branches or pull requests

4 participants