-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kerning.plist is not sufficient to do mixed RTL and LTR kerning #16
Comments
I don't agree that this needs to be in the API. See: opentypejs/opentype.js#95 (comment) cc @khaledhosny |
@behdad from your linked comment
The point with this issue is that you can't know if you are changing the left side or the right side by looking at a UFO kerning.plist entry. I agree that once you know which side you want to kern, there's no need for further arguments than one value. That's why it was easy to make a KernFeatureWriter for exclusively RTL kerning from ufo2fdk. |
I agree that pairs should have writing direction. However, I don't see a very clean way of doing so, without really breaking the current formatting of the kerning plist. My suggestion would be that the kerning value becomes a list, with the first member of the list the value, and the second the direction. Perhaps this is a UFO4 item for consideration, however. |
.@graphicore one thing I didn't think of this morning, assuming that your glyphs are all unicode encoded, it's very possible to pull direction from the python unicodedata.bidirectional property. Any kern pair with at least one member having a RTL property can be filtered for your kerning feature. Having realized that, I'm less sure that anything needs to be added to the UFO spec for writing direction, and if there were to be anything added, it would certainly be better in UFO4, not as part of UFO3. |
First of all, I did not expect that the UFO v3 spec is open to change kerning.plist. I just report an issue that I have with it. I agree that UFO v4 could be a target and I'm also open for any solution. I think, without fixing kerning.plist we have to use unicode and/or glyph names and class names. Unicode can definitely be helpful here, but the unicode value of a glyph alone is not sufficient, because in Arabic you have at least four glyphs for one unicode-character. Three of these can be encoded (in Arabic Presentation Forms-A) but, afaik, for a modern environment you rather wouldn't do so because they are accessed via GSUB. The same is also true in Latin for stylistic sets and so on. A naming scheme could help: One that let's you determine the unicode also for glyphs that are not encoded (like Adobes aglfn), plus maybe an optional "RTL" or "LTR" at the end of the glyph, that would be consulted first, if present. But, to be honest, that's not ideal. It's very indirect and it overloads the glyph names with a further use: writing features, pdf text extraction, referencing components and kerning directions. Not to forget finding glyphs in your editor and finding glyphs on the command-line when using git. |
Also, there are still cases then where you don't have full control. That may be corner cases—different writing directions are not really used within the same word, where kerning matters—but it's not a model that solves the full problem; so there might be bigger issues that we don't see yet. |
Yeah, I see the problem and it needs to be solved. To date, there have been RTL fonts produced with UFO, but the GPOS generation from the data in kerning.plist has had some additional smartness to organize the pairs by writing direction. ufo2fdk doesn't deal with this at all because the pair organization is very font specific. One thing that I'd like to do is preserve the file structure of kerning.plist. The data structure in there is replicated all over the place and changing it could be problematic in the real world. I've thought of a couple of other ways to approach solving the problem:
So, that's where the thinking on this is on my side of things. |
Lasse, in the short term, could you use 2 UFOs, and process one with all
the Arabic glyphs as entirely RTL, and the other with Lain as entirely LTR,
and then use pyftmerge to combine the binary results?
|
Thanks Dave, I think that's what I'll do :-) But at the moment we have the Latin UFO that will have the Latin kerning, and a UFO that Kourosh produces with glyphs (from the TTF) where he kerns the Arabic. There are some cases like the parentheses from the Latin in Arabic where I am eager to see how it works out :-) |
Are there use cases in which a single pair <firstglyph><secondglyph> would need different values for different directions? |
Not that I know, for a simple kerning pair. We need to know whether to produce:
or RTL:
But it's just one value. @typesupply I think from your three proposals, the third proposal is the easiest to implement and the most robust of those three. The second one is worse than restructuring kerning data in UFO, especially because information that belongs together is stored at different places, so we spend our time syncing data then, it could make a good workaround however. The first one is also a possible workaround for the current situation. |
The possibility of having different values when LTR and RTL seems illogical, but we know that crazy stuff exists (example: that very popular font that contains 2 glyphs with the same name; another one: those fonts that have two glyphs mapped to the same Unicode value). This could be tested by anyone with access to fonts containing both LTR and RTL kerning with a pretty simple fontTools script. It wouldn't be definitive, but it would be useful. I'll keep thinking about it. I agree that redundant data is bad, but the nice thing about option 2 is that the data would be easily transferable between fonts. The data wouldn't have to be a 1:1 match to the contents of kerning.plist, it would be a general guide for separating the pairs. Still, it's a layer of complexity. UFO 4 is supposed to be only a structural change, so I'm hesitant to introduce an entirely new file. But, it may be necessary. I did have an idea on the existing implementation issue, specifically the RoboFab API. (kerning.plist grew out of the RoboFab API, which is a flat dict or pairs.) RoboFab's |
Let me express my opinion again: the proposal here is wrong. Kerning is an inherently visual procedure and is independent of the writing direction of the text. The key point here is that kerning.plist specifically and rightly encodes "left glyph" and "right glyph". So, for example, if someone is kerning the following sequence of two characters:
it absolutely doesn't matter if it's a double-quote followed by period, in a left-to-right script, or if it's period followed by double-quote in a right-to-left script. Now. The reason Lasse and possibly others are confused is how the GPOS table works. Unfortunately individual lookups in OpenType do not specify what direction they expect their input in. That's something we (OpenType stakeholders) are fully aware of and planning to fix, but in the mean time, the reality is that lookups in the GPOS table are processed in the logical text order, whatever that means. For the most common cases, that's not a problem and "logical text order" is unambiguous. For example, Latin is left-to-right, and Arabic is right-to-left. That logic is encoded here: https://github.com/behdad/harfbuzz/blob/master/src/hb-common.cc#L446 The implication for this bug then is this: when generating a kern lookup, if it is to be referenced from a right-to-left script system ('arab' for example), it has to be generated differently from a lookup that is to be referenced from a left-to-right script system ('latn' for example). You cannot use the same lookup. Back to the I'll go one step further with that: even if the desired kerning of the Now, there's one case that is hugely ambiguous in OpenType: the case of kerning digits in Arabic and other right-to-left scripts, because digits in some of those scripts are written left-to-right. But I don't want to get into that discussion here (and how inconsistent implementations are). I hope I've made my argument clear. Would be happy to clarify. |
@behdad thanks for your input. If I read you right, you still don't tell us how the current way the data is organized in UFO makes it possible for us to determine the right output for our GPOS or AFDKO writing code.
Separating via script-language instead via script-direction sounds totally sane for me, however, we still need that information to be present somewhere in UFO. Following the proposals we heard, the best bet is probably different kerning.plist files. But separated by language:
Thanks for pointing this out. But we need a solution for this problem here, we can't just not discuss it. You say we should use script language to gather the information needed to write the correct lookups, fine. But then you say this is broken when it comes to digits. |
There's a lot of external data (from Unicode) that is needed to convert UFO to fonts, and this isn't any exception. Here's the simplest heuristic that works:
That's the crudest heuristic, but it works. From there on, you can add more complexity to achieve smaller kern tables. But none of these really belongs into the UFO per se. For example, a huge improvement:
In short, HarfBuzz and Uniscribe handle those differently. Easiest is to ignore them for now. Or if you wish, remove them from the kern pairs. I can't recommend one way or another as the "right" way at this time. |
Great, thanks! I will try. |
I might add that there is also vertical kerning, in both directions. I agree that the most common understanding of "kerning" should be "one-dimensional positive or negative single-value adjustment in a given direction between two entities, each of them being a glyph or a group". Contextual or 2D adjustments should be done in FEA directly, but for "kerning pairs", it would be nice to have an ability to declare the directionality, which can be:
|
which font is that? |
This implements what was discussed here unified-font-object/ufo-spec#16 (comment) The result's have been used but are not yet approved by anyone :-) This is the output file: https://github.com/Tarobish/Jomhuria/blob/master/sources/kerning.fea @behdad wrote: > * Generate both an LTR kern lookup and an RTL one; including all kerning pairs in both. * Reference the RTL lookup from script systems that are RTL, and the LTR one for others. > >That's the crudest heuristic, but it works. From there on, you can add more complexity to achieve smaller kern tables. But none of these really belongs into the UFO per se. For example, a huge improvement: > > * Associate each glyph to a Unicode character, * Exclude from RTL kern table all glyphs associated with Unicode characters that have Bidi_Type=L, * Exclude from LTR kern table all glyphs associated with Unicode characters that have Bidi_Type=R or Bidi_Type=AL.
Coming back to this, as we're thinking a bit more about UFO4, and it's something that's been on my mind. I understand that @behdad is correct at a high level with this, but looking through the implementation of the feature writer, there is a lot to be argued for adding a direction tag (if not there, defaulting to ltr) to the kerning values. This would allow the designer to explicitly set pairs that they want written out in a direction, and avoid having to have complicated heuristics for unencoded glyphs just to write a kerning feature. It would also help kerning tools in displaying direction of a pair. |
@benkiel What's the direction of the following pair: ")/"? |
Also, there are plans to fix this in OpenType. So, I suggest we think this through more before changing the spec to accommodate for currently font format shortcomings. |
@behdad I know, it's a mess. I'm glad to hear that it's being fixed in OpenType, but we may still need to think about this, as support for a new version of OpenType in existing engines will be slow. If we add a tag, perhaps we define the following behavior for writing a kerning feature: This isn't as smart as your feature writing proposal, but it does have the advantage of capturing the designer's explicit intent. |
I agree.
That's not true, and that's probably the source of the confusion/misunderstandings about RTL kerning. The "Writing Direction" paragraph from the UFO3 kerning.plist spec says:
so far so good, it seems to agree with Behdad's comment above, i.e. the left is the left, the right is the right, no matter the script's writing direction. You have two shapes placed next to each other horizontally and you want to either reduce or increase the spacing between the two (by either changing the advance width of the left-hand-side glyph to move its right-sidebearing, or by changing both the advance width and the x-placement of the right-hand-side glyph to move its left-sidebearing).
Now, if kerning were neutral and visual, the key in the top-level dictionary would not change depending on the writing direction, i.e. it would always be the left glyph or group. The paragraph ends with an example which can be confusing at first, because the RTL case is not a real-world one (using for example two Arabic letters)
So... we need to revive this discussion and agreee on a fix. We either add some pair direction database to mark each pair as LTR only, RTL only or both, or we change the semantic of kerning.plist to truly be visual left and right. |
cc @behdad |
Yes. |
The direction has to be determinable first from the text, based on the BiDi algorithm, and the punctuation rolled into glyph runs of appropriate directionality with adjacent glyphs, before we can even begin to address the question of how to make the lookup directionality clear. Do you think that's the case, Behdad: can we reliably determine what the direction of punctuation characters for OTL processing should be from BiDi (taking into account the complexities of embedding levels)? |
Yes. But also means that the kern pair for punctuations should be encoded in all script systems in the font. |
That's the problem. If kerning.plist contains such a pair between punctuation, e.g. |
Agreed: that's standard practice here. When making multi-script fonts, I've tended to create separate kern sources for each script, so our FL-to-VOLT workflow can easily generate separate sets of lookups for each script, including punctuation kerning, numeral kerning, etc.. Of course, there's some punctuation characters that are never used with some scripts, so I also tend to remove kerning pairs for those characters from the sources for those scripts. I don't know a robust way to automate splitting of kerning lookups by script, which is why I favour separate sources that I manage myself. One wants a kerning tool to be able to write the OTL lookups from whatever native format the tool uses, and to do so efficiently in terms of both data and layout processing, and to my mind that means controlling the separation of GPOS lookups that receive the kerning for different scripts. Since those lookups also need to include common punctuation glyphs — for kerning to the script and to each other, possibly in multiple directions —, keeping the sources separate has so far been the safest approach. |
Does that mean Arabic numbers kerning which is also written in the RTL lookup should have the |
Right. UFO2 was fine technically, by using "left" and "right" words. UFO3 departed from this in the name of being direction-agnostic, by calling the glyphs in the pair "first" and "second", in practice making it inherently ambiguous. |
no, that's the exception to the rule. They have "AN" bidirectional category which is weak LTR. |
@behdad I recall a conversation at the OTWG meeting in Mountain View in which we discussed the impact for kerning of Arab/Persian numerals in Harfbuzz's handling of Arabic layout vs Uniscribe's. My recollection is that if a font contains LTR kerning for numerals, presuming Uniscribe's treatment of numbers as LTR runs, this could be misinterpreted in Harfbuzz because of the way you apply RTL directionality across Arabic layout? Is that right? |
to clarify once again, Arabic number kerning should be written in the kern lookup for the RTL scripts, but using a LTR pos rule (only modifying advance of first glyph). |
Yes, that was a mistake, and I think our challenge is to come up with a clean way to correct this. Adding a new |
Correct. HarfBuzz applies kerning in RTL to all script=Arabic text. Uniscribe on the other hand, uses a hack, by classifying Arabic numerals as a different ArabicNumeral script code, which has direction LTR.
That's what Uniscribe does. I wouldn't call it a rule though. It's a hack we have to live with. |
If glyphs are laid out left-to-right, it makes sense to me that kerning is applied left-to-right too. Applying kerning right-to-left for Arab/Persian numerals is really confusing, speaking as a font developer. |
(Why did you have to open that can of worms here...) Right. But speaking as a shaper, when I see script=Arab direction=LTR, I don't know if it's numbers running their natural LTR, or letters forced to LTR visa LRO character. Short of looking at the character codepoints I have no way to distinguish. |
@behdad I'm a bit puzzled. Could you please elaborate why direction agnostic kerning pairs in the new UFO format is ambiguous if at the end all the pairs should be written in a logical order? |
I'm not Behdad, but this is how I understand it: take a "neutral" kern pair, such as period/parenright. If the order in the kerning data is visual left-to-right, it is clear what is meant if you kern that pair, and you can compile it to LTR kerning and RTL kerning. If it's supposed to follow the writing direction (as in the UFO3 spec), it's impossible to determine what the intended direction of the the pair was, as the glyphs in question have no inherent direction. So you simply can't tell whether "period parenright 40" was meant as visual "period parenright 40" or visual "parenright period 40". If the intention is not clear, how can you even compile it? This seems to be confirmed by this quote from Cosimo's explanation:
|
correct. the same thing I wrote earlier in this thread: |
so hypothetically if the value in the pair was written in four int array but still in a logical order it would be fine? |
You need to separate what is (or can be/should be) stored in the UFO and what has to be compiled to GPOS. Those are two different things. The fact that RTL kerning is "difficult" in GPOS should have no influence on the UFO spec. |
I'm only trying to figure out how a kerning tool should store the data and how should it compile it in UFO. I'm very sorry if I get things slow, but since I've only understood the direction agnostic way of storing data, the idea of storing pairs in visual order is throwing me out of balance. Maybe examples will make it easy to implement: UFO3 (kerning pair direction agnostic):
UFO2 (kerning pair from left to right):
Maybe the above example is wrongly written in the GPOS record. But If I'm correct, how UFO2 is a better approach? |
Let's take this off line. Please email me. |
UFO3 kerning is not writing direction agnostic, it uses logical (first/second) order like in OpenType, thus depends on the script writing direction what is deemed "first" or "second", hence not writing direction agnostic. |
if you believe that storing the left-most glyph in a pair on the left-hand side, and the right-most glyph on the right-hand side is enthocentrism... well, I'm ok to swap the order if that makes it more comfortable for you, I don't really care :) |
I'm going to close this as we have a discussion about how to fix this in #96. |
Without a hint to the kern feature generating code it is not possible to determine whether a kerning pair should be treated as LTR or RTL kerning. The kerning pair should be bear that information. External heuristics are probably bound to fail, but maybe a in most cases working solution is possible.
This was discovered in this discussion about Jomhuria, an Arabic script font that I work on:
https://groups.google.com/d/msg/googlefonts-discuss/kLnaUCegGjA/rjri18RMOvwJ
I quote here the relevant part:
My workaround at the moment is a subclass of
ufo2fdk.kernFeatureWriter
that writes only RTL kerning for all data of kerning.plist:https://github.com/Tarobish/Jomhuria/blob/master/tools/getKernFeatureFromUFO.py
The text was updated successfully, but these errors were encountered: