-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a file size key to the inventory #629
Comments
This could be considered as an additional kind of fixity check - the file should be x bytes in size - but I suspect I'm pushing the definition of the word 'fixity' here. |
2023-06-01 Editors' discussion -- This could be done within the current specification by creating an extension that defines (as mentioned in #629 (comment)) a new fixity type, perhaps called |
@zimeon should I make a pull request against https://github.com/OCFL/extensions/blob/main/docs/0001-digest-algorithms.md ? |
Yes, the process is outlined in https://github.com/OCFL/extensions/blob/main/docs/0001-digest-algorithms.md#maintenance -- because we are not versioning extensions the PR should create a new digest algorithms extension that obsoletes 0001 |
Spun out to OCFL/extensions#64 |
Create a new digest algorithms extension to add 'size' to the list of allowed algorithms; obsolete the previous digest algorithms extension. As described in OCFL#64 and following discussion in OCFL/spec#629.
The implication of |
Interesting question @srerickson. My feeling is that it doesn't represent a major change in how fixity should be used but I'd love to hear other thoughts. I just created a new fixture suggestion of an object that has two different files with the same MD5 value: OCFL/fixtures#107 . Implementations have to deal with this possibility even without extension digests that might be even weaker than currently specified digests. |
@zimeon that fixture is really helpful thanks! This issue has helped me identify a problem in my own implementation where fixture collisions are treated as an error condition instead of being handled gracefully. I don't mean to belabor the point, but I wonder if the implementation notes could address collisions a bit better. From this discussion, a key difference between fixity and manifest digests is that manifest digests are assumed to be collision-free, whereas collisions in fixity digests should be expected and handled gracefully. This point doesn't come across very clearly in the current fixity section which, instead, focuses on content addressability and tampering. |
2023-07-06 Editors' discussion - we agree that it would be helpful to add a note to the fixity section of the Implementation Notes pointing out that fixity algorithms may generate the same value for different file content |
algorithm extension has a PR that has been submitted and is being reviewed |
* Add 'size' to list of allowed digest algorithms Create a new digest algorithms extension to add 'size' to the list of allowed algorithms; obsolete the previous digest algorithms extension. As described in #64 and following discussion in OCFL/spec#629. * Make integer explicitly decimal * Update size string expression definition
Following on from, but not necessarily looking to revive: #474
It would be very useful for a repository manager to know how big an OCFL object and its component binary files are on disk. It affects a lot of decisions we're likely to make regarding how to handle the object and its component files.
Given the processing work required to generate the checksum, it seems like an opportunity to include the file size of a binary file represented by a given checksum. A key akin to the 'fixity' key, containing an array of key value pairs, might allow this, e.g.
The text was updated successfully, but these errors were encountered: