-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: mime types for dc:description? #1650
Comments
I like the idea.
We really need a method to include HTML inside description. The mime-type
attribute can help.
…--
Ori Idan CEO Helicon Books
http://www.heliconbooks.com
On Tue, Apr 27, 2021 at 6:57 PM Alex Cabal ***@***.***> wrote:
I searched through the issue archive and didn't find anything about this,
so forgive me if this has already been discussed.
In an epub's metadata, it's often desirable to include rich formatting,
like HTML, in elements that have a more prose-like format like the epub's
description.
Since <dc:description> can only contain plain text, at Standard Ebooks we
make <dc:description> a short plain text sentence, and then we add an
additional element,<meta property="se:long-description"
refines="#description">, which includes a much longer description. This
longer description is an HTML fragment, which is escaped since <meta>
cannot have children.
Here's an example:
<dc:description id="description">An amateur sleuth visting a country house solves the mystery of who shot the man in the locked room.</dc:description>
<meta id="long-description" property="se:long-description" refines="#description">
<p><i>The Red House Mystery</i> is a detective novel by <a href="https://standardebooks.org/ebooks/a-a-milne"><abbr>A. A.</abbr> Milne</a>, better known for his children’s writing...</p>
</meta>
Since we're using a custom property within our own se namespace, it's
reasonable to assume that reading systems that know to extract it will also
know to expect escaped HTML. No problem.
But, would it be valuable to epub in general to be able to specify the
mime type of the description, so that publishers could include HTML
descriptions that reading systems would know to parse/render as HTML?
For example:
<dc:description>A plain text description.</dc:description>
<dc:description mime-type="text/html"><p>An HTML description.</p></dc:description>
Now the dc namespace doesn't include a mime-type attribute, but as far as
I can tell (and I may be looking in the wrong place!) neither does the opf
namespace include attributes like property or refines.
A possible implementation in the spec might default to assuming element
contents to be plain text in the absence of a mime-type attribute, to
fall back to plain text rendering if the reading system can't render the
mime type, and to only allow some subset of mime types, like (text/plain,
text/html, text/markdown, application/xhtml+xml). It might allow multiple
<dc:description> elements as long as each has a different mime type.
This is not a fully-formed proposal, just something to spark discussion!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1650>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QFLWDFSFN26SSGK5SDTK3NIBANCNFSM43VI4LRQ>
.
|
I am a bit bothered by the fact that the HTML text must be escaped, which makes it very user unfriendly... This is not a problem with markdown, but I am not sure markdown is widely used among epub authors. What about using the |
I don't think it is a problem for reading systems to unescape the
description.
Using link will make it more complicated I think.
…On Tue, Apr 27, 2021 at 8:29 PM Ivan Herman ***@***.***> wrote:
I am a bit bothered by the fact that the HTML text must be escaped, which
makes it very user unfriendly... This is not a problem with markdown, but I
am not sure markdown is widely used among epub authors.
What about using the <link> element, that can then refer to a separate
file, and it already has a media type attribute. What is missing from the
spec is a suitable link relationship; the specification may add a new link
relationship for description to cover this. It may make the job of
reading systems also easier, because they fall back on an existing
mechanism...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1650 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QDKG5HO2NVLKEAXXOLTK3X63ANCNFSM43VI4LRQ>
.
|
It could also be included as CDATA. I think there's something to be said for having all the ebook's metadata in one location, instead of in various linked files somewhere in the epub. And linking a file would invite publishers to create huge/complex HTML pages for descriptions which I think we probably want to avoid. |
True. My worry was on the EPUB author's side, though, mainly when the description becomes longer. I realize that the idea is to have shorter descriptions also as HTML fragments, but these things may have the tendency of getting longer and longer over time if the facility is there...
True, and that is probably better.
Formally, that is already the case. If you include more complex metadata (say, ONIX data) then the idea is to use
Probably... but what constitutes huge is unclear and I am not sure where I would draw the line... Anyway, to be clear, I am not against your proposal, just exploring alternatives. See what others have to say... |
One of the big hurdles is going to be that reading systems don't generally want author formatting. Just look at the table of contents where we perpetually argue over whether reading systems should be forced to use the formatting that authors put within the nav doc links. If we can't get support there, I'm less optimistic we'll get support within package metadata. @iherman has pointed out the preferred method for including metadata in an alternative form. That's exactly why we introduced linked records. We could make the linking more granular to specific fields, but if there's a precedent to be learned from the original exercise, it's again that reading systems aren't too interested in metadata expressed in alternative formats. There's also a danger that making alternative forms look like standard issue metadata will have unwanted effects. If a reading system picks up a description, it may very well render all the escaped markup. It all depends on which But I freely admit I've become a bit jaded when it comes to epub metadata! 😄 This might be another issue for the CG to explore. Even if they can't introduce new attributes, exploring the current possibilities and/or rallying some reading system support for a new attribute is going to be necessary before we look at adding this into the core. |
Is there implementer interest in reading more metadata from EPUBs themselves? Very few reading systems will display more than author and title. This also makes it difficult to author such metadata, when you can't see how it will display. Anecdotal evidence suggests that even text-only metadata embedded in EPUBs is often incorrect. I can only imagine how much worse it would be with escaped HTML. Oh, and what Matt said. |
This is not only RS that needs it, Cataloging systems also read the EPUB
metadata and needs the description.
In Helicon Books we have both a reading system that uses the description
and also an online cataloging system that reads the description, therefore
we will be happy to have reach description.
…On Wed, Apr 28, 2021 at 3:49 PM Dave Cramer ***@***.***> wrote:
Is there implementer interest in reading more metadata from EPUBs
themselves? Very few reading systems will display more than author and
title.
This also makes it difficult to author such metadata, when you can't see
how it will display. Anecdotal evidence suggests that even text-only
metadata embedded in EPUBs is often incorrect. I can only imagine how much
worse it would be with escaped HTML.
Oh, and what Matt said.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1650 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB43QD24ZLQK74E5PMQKTDTK776HANCNFSM43VI4LRQ>
.
|
I wasn't around for that discussion, but the nice thing about having a properly structured ToC (i.e.
My concern there is that if HTML descriptions are allowed as external files, then the long-term conclusion of that is huge JS-enabled beasts with big CSS stylesheets, complex unsemantic structure, and maintenance burden as HTML/CSS change over time--just like what the modern web has "evolved" in to. I think this touches on the ToC issue above. In epub, ToCs are external files that have the full power of HTML/CSS, and as such there's argument on how to render them because there's so much author freedom. Limiting a rich description to being the contents of a metadata element, and explicitly calling it a "fragment" of HTML with no
Perhaps, but that's what The spec could also require that exactly one plain text |
What this amounts to is to define some sort of an HTML "Profile", which is more complex than what one would expect (a precise specification should then define the detailed processing in terms of what the HTML spec defines and, in view of the extreme complexity of the latter, that is not an easy task). I know that we did consider such a route in another Working Group and we quickly shied away from it. (Note that security considerations would force us to do such a profiling: otherwise it would allow adding javascripts and even handlers into the description content which is then a security risk.)
If we went down the line of enriching the But the reservation of @dauwhe and @mattgarrish are certainly compelling to me. |
I see so many complications. You're requiring reading systems to add a markdown parser. You're requiring content authors to learn an entirely new vocabulary, meant to be an authoring aid for, well, HTML. This is adding a lot of complexity to the ecosystem. EPUBCheck would have to somehow write a markdown validator. |
This is why we allow linked records in formats that such systems would expect - marc, mods, onix, etc. This proposal, as I understand it, is to add formatting for displaying the description, not for improving the cataloguing of it.
Except that means introducing overrides to metadata handling for whitespace. Reading systems are supposed to trim and compact whitespace in metadata elements, but now in the presence of this attribute they'd have to preserve. We can always make more and more rules to make anything work, but in the process the more brittle processing becomes.
This is what the community group would need to look at. I believe there are some reading systems that process descriptions (or that's my memory of the epub 3.1 reading system survey), but never assume how. We've also had problems with the wrong authors being processed and you'd assume that would be pretty straightforward, too.
That wouldn't help legacy systems, though, as it's the change due to the presence of the attribute that they won't recognize. A lesson learned the hard way in IDPF is that adding new features doesn't translate into quick reading system support (or any reading system support). For W3C, we need to prove uptake of all new features. I'm not unsympathetic to the limitations of epub metadata, but this would be a pretty radical change. That's why it needs more incubation. |
Ivan already mentioned my main concern with this, it is actually security. Yes we'd require the creation of a profile of HTML to do this, but I would say as an RS this still presents the risk of opening up a security concern within the internal metadata. The profile that could counteract this would likely be highly restricted (i.e. style tags only), and I imagine we'd end up back where this issue stems from if we did that. |
The issue was discussed in a meeting on 2021-05-07 List of resolutions:
View the transcript3. mimetypes for dc:descriptionSee github issue #1650. Dave Cramer: the use-case here is to provide rich formatting for epub metadata in the package file Laurent Le Meur: Atom provides the ability to do this
Dave Cramer: book descriptions in our ONIX feed use escaped markup Bill Kasdorf: in the trade book side of things, its not comment to embed things in metadata Ivan Herman: for those areas of publishing, would linking to metadata be acceptable? Bill Kasdorf: if that publisher's ecosystem supports that, then yes Ivan Herman: the only minor thing that came up is that if we say we prefer the linked element solution, then we may want to add ability to use link to more elements Dave Cramer: i'm also not aware of a RS that supports linked metadata elements George Kerscher: what about where the book is ingested and that link is used to provide information about that book Dave Cramer: I think so, but I'm not aware of this happening Laurent Le Meur: from what i heard, epub metadata is not used in ingestion, they use mostly ONIX Wendy Reid: some epub metadata is used on ingestion, e.g. epub 3 vs 2, FXL vs reflow George Kerscher: VitalSource uses epub metadata, RedShelf uses it, CG is working on ways to expose epub metadata, we're working with libraries (EBSCO, Proquest) to expose it Tzviya Siegman: our research showed that the field that was used most was the identifier, and that was used to correlate to ONIX, for example Bill Kasdorf: all GCA certified publishers are putting accessibility metadata in their epubs Dave Cramer: agreed that metadata in epubs are useful, but not sure that we should complicate that metadata
George Kerscher: Micah Bowers is working on a way to create citations and bibliographic references, and that work depends on the metadata in epubs Ivan Herman: so it seems the WG is not in favour of complicating the way we do metadata in epubs, with the exception of maybe adding a new relationship Wendy Reid: the new relationship piece can be a new issue or PR, i think
Tzviya Siegman: can we clarify what we are adding?
Dave Cramer: little afraid that this will open a can of worms with other people wanting their own vocab added
Ivan Herman: should I come up with separate PR about the new relationship? Dave Cramer: a new issue, i think, where we can further discuss it |
See also issue raised in #1666 |
Thank you! |
I searched through the issue archive and didn't find anything about this, so forgive me if this has already been discussed.
In an epub's metadata, it's often desirable to include rich formatting, like HTML, in elements that have a more prose-like format like the epub's description.
Since
<dc:description>
can only contain plain text, at Standard Ebooks we make<dc:description>
a short plain text sentence, and then we add an additional element,<meta property="se:long-description" refines="#description">
, which includes a much longer description. This longer description is an HTML fragment, which is escaped since<meta>
cannot have children.Here's an example:
Since we're using a custom property within our own
se
namespace, it's reasonable to assume that reading systems that know to extract it will also know to expect escaped HTML. No problem.But, would it be valuable to epub in general to be able to specify the mime type of the description, so that publishers could include HTML descriptions that reading systems would know to parse/render as HTML?
For example:
Now the
dc
namespace doesn't include amime-type
attribute, but as far as I can tell (and I may be looking in the wrong place!) neither does theopf
namespace include attributes likeproperty
orrefines
.A possible implementation in the spec might default to assuming element contents to be plain text in the absence of a
mime-type
attribute, to fall back to plain text rendering if the reading system can't render the mime type, and to only allow some subset of mime types, like (text/plain
,text/html
,text/markdown
,application/xhtml+xml
). It might allow multiple<dc:description>
elements as long as each has a different mime type.This is not a fully-formed proposal, just something to spark discussion!
The text was updated successfully, but these errors were encountered: