-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May inventories contain properties that aren't defined in the spec? #474
Comments
It was certainly intended that no other keys can be present in the inventory other than those defined in the spec. |
Im pretty sure it was to enable subsequent versions of OCFL to make additions without breaking backwards compatibility. |
If new keys are added to the inventory then that would necessitate a new minor version of the spec, but it wouldn't break backwards compatibility. If the current keys defined in the spec were to be renamed or removed, it would break backwards compatibility but it would also require a major version change. But in both cases I don't believe it was ever intended that keys not defined in the spec are permitted in the inventory. I think the MUST in the inventory section are missing an explicit 'and MUST NOT contain any other keys'. |
If there is a MUST NOT clause then adding new keys in new versions breaks backwards compatibility. A V1.1 Inventory would fail V1.0 validation. This was something I thought we wanted to avoid as much as possible. However, now we have the concept of an extensions mechanism, which did not exist when we worked on this bit, we have the additional possibility of having an "extension" key - which can contain all the extensions relevant to the object with their parameters. We can then require non-OCFL keys to be encapsulated in an extension. Thoughts...? |
I had assumed that inventories were to be validated against the OCFL version specified in their There are some ambiguities here that should probably be clarified in the spec. Here are some more questions:
In a scenario where multiple OCFL spec versions exist that have substantive differences in inventory serialization, then it does become problematic to deserialize inventory files to anything other than a basic map structure if it is not knowable without reading the inventory contents what format its in. That aside, if you were to allow inventories to include keys that are not defined in the spec, I would certainly feel better about it if they were at the very least encapsulated in some way, as @neilsjefferies was suggesting. |
A mediaType extension like @pwinckles's example would be interesting to me. mediaType/mimetype is one of the system properties handled by Fedora 3. I can move the mimetype to another file in ocfl, but if it were an option, I might put the mimetype in inventory.json. And I'd be fine with it being encapsulated in an "extensions" key. But I don't have to have it in inventory.json at all - I can handle it either way. |
My recollection is that it was intentional only to specify the necessary structure of the inventory so that there could be evolution or extension anywhere else. Boxing within |
Is the |
Validating a 1.1 inventory with a 1.0 validator is a break in forward compatibility (the object is newer than the code that is verifying it), but I think that's expected. (You can't build a validator that will predict the future...) If the new keys added in the subsequent version were I think we should be clear about the purpose of the inventory file. In my mind, it contains only the data required to effectively track the changes to the files in the object. What I don't think it should be is a sort of config file for capturing different options and behaviours of the object. That would be my concern if we started putting keys, such as |
Yep, @ahankinson , I meant forwards. This does potentially matter though, an object version (and thus its inventory) should always be a valid within all future versions of OCFL (since it is immutable). This actually places quite a few restrictions on what we can do with inventory entries. For example, it is not possible to require new keys without version specific language. @pwinckles I think this answers your question on conformance and upgrading too. A V2.0 object can contain V<2.0 versions and they should be valid. Inventory versions can only ratchet upwards with new versions, obviously! Being overly proscriptive about keys doesn't prevent any failure modes or add new capabilities as far as I can see - all it does is add an additional compatibility issue for no obvious benefit. @zimeon Hence I said "non-OCFL" keys should be in extensions, future OCFL versions should be able to specify additional keys. This needs careful wording though. In the case of digests and fixity outside the OCFL standards, it does make sense that some reference can be made in the inventory to the relevant extension. |
Uhhh... I don't think I agree with @neilsjefferies . We haven't really made any declarations about whether OCFL Objects can have mixed versions, and the presence of the |
To my mind the handling of mixed version objects is an issue to |
Editor's meeting: Decision was to disallow attributes not specified in the spec, under the principle that it would be better to restrict behaviours and then gradually open them up, then to do the opposite if it becomes a problem. This will be open for community feedback and more use-case gathering post-1.0. |
Crud it looks like I missed the ticket. At IU we're starting to look at OCFL as a potential storage format and since we're using tapes as our storage being able to hold some technical metadata (or other storage information) about the files in an object as part of the inventory would be a big help to reduce tape access if someone just wants to know the (rough) size of the object or duration information. Until the 2.0 release cycle is there an option to add add a use-at-your-own-peril key that could be used to store that information and still validate? I'm thinking in the same vein as IANA's X-* mime types. |
Hi @bdwheele, the inventory wasn’t really designed to hold metadata about the files, it was primarily designed to make the versioning system in OCFL work. Since it’s trying to keep a level of compatibility across time and across clients, and because we didn’t feel like we had gathered enough use cases for this, we felt it was best to follow Postel’s law and “be conservative in what we validate”. The equivalent to the “x-*” mimetype would probably be an extension that gathered the relevant metadata you need. |
That's fair enough and I understand the rationale, but if I may offer a bit of background of where I'm coming from to provide some context Here at Indiana University we have several decades of scanned documents (photos & books) as well as a sizeable collection of A/V material (~14PB) all of which are stored on a proprietary tape system (HPSS). As a further fly in the ointment, we share the tape system with the rest of the university so we have to be good citizens and not inadvertently create denial-of-service to the other units. We're currently looking at what our preservation situation is going to look like for the future since what we have is a mix of several different systems. OCFL has come up multiple times during our investigation and it looks interesting. We have not decided if we want to use someone else's management software (with modifications to use our storage) or if we want to write our own, or even a mix of the two. Using a tape system creates a lot of headaches when managing the content due to latency issues, so we try to collect as much information about the objects as possible before they're committed to storage. The downside currently is that the information is kept in two separate places: a copy in our database(s) and one on the tape storage (in several cases). We would like to make sure that the metadata we've collected in both places is consistent, or could be rebuilt without reading the files since that's incredibly time consuming. It seems to me that having the ability to store arbitrary (and explicitly separated) data in the inventory would be a good thing. With that (and RFC 760's "Liberal in its receiving" text) in mind, would it be possible to create a 'private' toplevel that would be used by management tools to store arbitrary data about the object which would not be validated by the OCFL validator beyond being syntactically correct? Within the private node it would probably be wise to suggest an application ID (such as edu_indiana_dlib_archivemanager or something) to allow multiple applications to store metadata without interfering with either the OCFL content or content stored by other applications. For IU we'd probably want to store technical metadata (size, mime type, stream information, etc), but a tar-like application that generates OCFL would likely include ownership and permissions. One assumes that "immutable" descriptive metadata would also be an option (ownership, alternate IDs, title, etc) A solution of that nature would be forward compatible because it is up to the application to manage that content. Data that stored in the private space would be ignored by parsers reading the 1.0(?) spec. If a future specification included fields for commonly used metadata, it would be up to the application to upgrade the package (because there are other files involved beyond the inventory) and it would be able to deal with backward compatibility by looking at both the future spec's location as well as its own private data. An additional benefit to adding this space is that it would provide real usage data for future directions for OCFL: if all of the management applications are storing file size, for example, that would lend creedence to adding a size field for future versions of OCFL Thank you for your time. |
@pwinckles, no there isn't anything absolutely preventing us from going that route and it could work. So it's not a blocker, but embedding the immutable metadata in with the inventory could offer advantages in some situations:
|
* Fixed: disallow arbitrary keys Fixes #474 * Fixed: Addressing review comments Moved MUST NOT constraint to section introduction
I was sure that the spec forbid this, but @neilsjefferies just pointed out to me that it in fact does not. Is this intentional?
If so, this would allow for something like, for example, a media type extension that augments inventory files like the following:
The text was updated successfully, but these errors were encountered: