-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File should have technical metadata #16
Comments
Not sure if this is helpful here, but baseline technical metadata properties across file formats/types from the Hydra Technical Metadata Subgroup are available: https://docs.google.com/document/d/1SZCpSIdlGfXgoYrAnW2eRKlIt6O-1ADIDDhmLrvxeLc/edit#heading=h.a8hurtypz8qi |
In a block like so:
|
Reiterating @jlhardes comment about the technical metadata profile -- this is now available on the duraspace wiki at: https://wiki.duraspace.org/display/hydra/Technical+Metadata+Application+Profile |
@jcoyne Just to make sure I understand what I need to add here. Is the goal of this ticket to have something like this at the end (assuming I get the correct Ruby classes for each of those predicates) ?
|
👍 Pronom's impossible, but besides that |
Should hydra-pcdm be concerned about what kind of tech metadata it is, or just that it has any kind of tech metadata? |
For this sprint, I think it makes sense to go with required properties (File Name and File Size) at a minimum. If recommended properties can also be includes (Label, Date Created, File Hash, File Format Type, and Has Mime Type) that would make it more complete. The optional fields can probably be safely ignored for the sprint - they aren't completely workable anyway (pronom:puid, for example). |
@jlhardes I agree. My question was really more about the schema. Are we enforcing a techdata schema at this level? I guess it doesn't matter, since it's RDF, if an implementer wants to use a different one, they just add it in. I think we can flesh out those details at the hydra-works level, with integration tests that serve as an example of someone who would want to build their own PCDM-approved object and add additional technical metadata. |
@jlhardes This is good to know since I've got File Size working (Fedora does that automatically using premis:hasSize as the predicate)
|
@jlhardes Question: Is it OK if we use the equivalent predicates indicated in the document that you posted (e.g. use "premis:hasSize" instead of "ebucore:fileSize") or should we use the one indicated at the top of each one (e.g. "ebucore:fileSize") ? |
If we can stick with the properties at the top (Property name: ebucore:fileSize) that might make things easier for the sprint (not so much to implement). Those Property names are the main ones we'd like to see implemented anyway for technical metadata. The equivalent properties are listed to help explain the property and to provide options if the property we're listing can't be used for some reason. |
@hectorcorrea the logic behind using ebucore was that it is a comprehensive vocabulary for technical metadata. So rather than splitting the technical metadata properties across lots of different vocabularies (nfo, exif, dc, premis, etc, etc), it would be much more sane to start with a well supported, single vocabulary. |
@jlhardes @acoburn Fedora automatically calculates and stores (as read-only) the following properties premis:fileSize, fedora:digest, and fedora:mimetype. I could add three separate properties with educore predicates as the document recommends, but they would have to manually set and run the risk of having different values than what Fedora already stores. Do we really want to do that or should we stay with the Fedora provided properties? |
@hectorcorrea part of the thinking here was that if an external tool (e.g. FITS) calculates these value, they can be put into the ebucore properties (since the existing properties are managed by the server and hence read-only). The advantage of using the additional properties include:
The disadvantage of using the additional properties is:
That said, I don't actually have a strong opinion one way or the other. @jlhardes thoughts? |
Yes, 👍 to that. But, I don't think hydra-pcdm should have any opinions about what tech data you're using or what your'e using to create it. It should just allow you to using whichever tool and schema you prefer. |
@awead I disagree with that somewhat. I think it should provide an opinionated default. You should be allowed to do something else though. |
We had some discussion about these properties in relation to properties that are already in Fedora and I wasn't quite sure which of these mapped, so you've helped clear that up, @hectorcorrea - thanks! I don't actually understand how it works to NOT use what we are implementing on this sprint. It seems like we want to see a baseline of technical metadata across all Hydra implementations using PCDM to make things easier going across systems and sharing externally. I understand that we don't want to limit people's implementations by making these properties using these predicates a requirement but it seems like we do want to encourage their use. I think for that reason and for the longer term it's better to go with a more externally-useful standard, so I'd stick with premis:hasMessageDigest, ebucore:hasMimeType, and ebucore:fileSize to express those properties, even though it is a bit of duplication. Additionally, I don't think premis:fileSize actually exists (http://id.loc.gov/ontologies/premis.html) - at least not in RDF premis. I think the premis property might be hasSize so if Fedora is using premis:fileSize, I'm not sure what ontology is actually being used. |
@jcoyne agreed. If there's any additional tech metadata you want beyond what Fedora is giving you already, then it should be as easy as simply including a module with the additional properties. Any implementation would then override that module, or more realistically, just include their own. The side effect is that you may have extra triples with different predicates but duplicate object content. So, if we use @jlhardes recommendations, you'd have two triples with the checksum, fedora:digest and premis:hasMessageDigest. And two for mime type: fedora:mimeType and ebucore:hasMimeType (assuming their object values can be the same). I think @hectorcorrea meant premis:hasSize. That's what comes back from Fedora if you do GET request on the binary's fcr:metadata node. |
So I went ahead and implemented the additional properties. The only caveats with the current implementation are:
|
See https://github.com/projecthydra/active_fedora/blob/master/lib/active_fedora/file.rb#L119-L121
https://github.com/projecthydra/active_fedora/blob/master/lib/active_fedora/with_metadata/metadata_node.rb
The text was updated successfully, but these errors were encountered: