Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kerning.plist is not sufficient to do mixed RTL and LTR kerning #16

Closed
graphicore opened this issue Jul 7, 2015 · 75 comments
Closed

kerning.plist is not sufficient to do mixed RTL and LTR kerning #16

graphicore opened this issue Jul 7, 2015 · 75 comments

Comments

@graphicore
Copy link

Without a hint to the kern feature generating code it is not possible to determine whether a kerning pair should be treated as LTR or RTL kerning. The kerning pair should be bear that information. External heuristics are probably bound to fail, but maybe a in most cases working solution is possible.

This was discovered in this discussion about Jomhuria, an Arabic script font that I work on:

https://groups.google.com/d/msg/googlefonts-discuss/kLnaUCegGjA/rjri18RMOvwJ

I quote here the relevant part:

On 07/01/2015 01:01 PM, Khaled Hosny wrote:

On Wed, Jul 01, 2015 at 05:35:50AM +0200, Lasse Fister wrote:

On 07/01/2015 02:06 AM, Khaled Hosny wrote:

On Tue, Jun 30, 2015 at 11:52:47PM +0200, Lasse Fister wrote:

On 06/30/2015 11:19 PM, Khaled Hosny wrote:
Also, where can I find the fea file to give it a look?

https://github.com/Tarobish/Jomhuria/blob/master/sources/kern-arabic.fea
That was created using ufo2fdk from a ufo created with GlyphsApp.

Hmm, I think the kerning is wrong here, I see things like:

pos @kern1.MMK_L_BehInti uni0676.fina -30;

But this will just subtract 30 units from the advance width of the rightmost
glyph, which is not enough as you need to move it 30 units to the left
as well, so it needs to be:

pos @kern1.MMK_L_BehInti uni0676.fina <-30 0 -30>;

Ok, so probably the ufo2fdk script should generate this for right to
left kerning, but it can't know from the way ufo stores its kerning
data. This is probably a problem with UFO as a format(?). UFO has only
very simple support for kerning:

http://unifiedfontobject.org/versions/ufo2/kerning.html

and also:

http://unifiedfontobject.org/versions/ufo3/kerning.html

The latter has a passage about writing direction:

The kerning data is writing direction neutral. For text written
left-to-right, the left-most glyph is the key in the top level
dictionary. For text written right-to-left, the right-most glyph is
the key in the top level dictionary. For example, given the pair /LG/,
written left-to-right, the /L/ is the key in the top dictionary and
the /G/ is the sub-dictionary. Given the pair /GL/, written
right-to-left, the /G/ is the key in the top dictionary and the /L/ is
the key in the sub-dictionary.

But given that you say, that we need to know the writing direction to
make such a "simple" kern pair for RTL script, just stating that "data
is writing direction neutral" is not enough. Or do I miss here something?

I think so, there need to be a way to signal what direction the kerning
will be used for (and it needs to be explicit, any heuristic is bounding
to give wrong results for some cases).

My workaround at the moment is a subclass of ufo2fdk.kernFeatureWriter that writes only RTL kerning for all data of kerning.plist:

https://github.com/Tarobish/Jomhuria/blob/master/tools/getKernFeatureFromUFO.py

@behdad
Copy link

behdad commented Jul 8, 2015

I don't agree that this needs to be in the API. See: opentypejs/opentype.js#95 (comment) cc @khaledhosny

@graphicore
Copy link
Author

@behdad from your linked comment

Now, what this means for GPOS kerning is straightforward: if you are making changes to the left-side glyph, you update advance only. If you are changing the right-side glyph, you update both.

The point with this issue is that you can't know if you are changing the left side or the right side by looking at a UFO kerning.plist entry.

I agree that once you know which side you want to kern, there's no need for further arguments than one value. That's why it was easy to make a KernFeatureWriter for exclusively RTL kerning from ufo2fdk.

@benkiel
Copy link
Contributor

benkiel commented Jul 8, 2015

I agree that pairs should have writing direction. However, I don't see a very clean way of doing so, without really breaking the current formatting of the kerning plist. My suggestion would be that the kerning value becomes a list, with the first member of the list the value, and the second the direction.

Perhaps this is a UFO4 item for consideration, however.

@benkiel
Copy link
Contributor

benkiel commented Jul 8, 2015

.@graphicore one thing I didn't think of this morning, assuming that your glyphs are all unicode encoded, it's very possible to pull direction from the python unicodedata.bidirectional property. Any kern pair with at least one member having a RTL property can be filtered for your kerning feature.

Having realized that, I'm less sure that anything needs to be added to the UFO spec for writing direction, and if there were to be anything added, it would certainly be better in UFO4, not as part of UFO3.

@graphicore
Copy link
Author

First of all, I did not expect that the UFO v3 spec is open to change kerning.plist. I just report an issue that I have with it. I agree that UFO v4 could be a target and I'm also open for any solution.

I think, without fixing kerning.plist we have to use unicode and/or glyph names and class names.

Unicode can definitely be helpful here, but the unicode value of a glyph alone is not sufficient, because in Arabic you have at least four glyphs for one unicode-character. Three of these can be encoded (in Arabic Presentation Forms-A) but, afaik, for a modern environment you rather wouldn't do so because they are accessed via GSUB. The same is also true in Latin for stylistic sets and so on.

A naming scheme could help: One that let's you determine the unicode also for glyphs that are not encoded (like Adobes aglfn), plus maybe an optional "RTL" or "LTR" at the end of the glyph, that would be consulted first, if present.

But, to be honest, that's not ideal. It's very indirect and it overloads the glyph names with a further use: writing features, pdf text extraction, referencing components and kerning directions. Not to forget finding glyphs in your editor and finding glyphs on the command-line when using git.

@graphicore
Copy link
Author

But, to be honest , that's not ideal. …

Also, there are still cases then where you don't have full control. That may be corner cases—different writing directions are not really used within the same word, where kerning matters—but it's not a model that solves the full problem; so there might be bigger issues that we don't see yet.

@typesupply
Copy link
Contributor

Yeah, I see the problem and it needs to be solved. To date, there have been RTL fonts produced with UFO, but the GPOS generation from the data in kerning.plist has had some additional smartness to organize the pairs by writing direction. ufo2fdk doesn't deal with this at all because the pair organization is very font specific.

One thing that I'd like to do is preserve the file structure of kerning.plist. The data structure in there is replicated all over the place and changing it could be problematic in the real world. I've thought of a couple of other ways to approach solving the problem:

  1. Define some sort of pair organization algorithm or RTL glyph marking method. This will probably run into some nasty edge cases.
  2. Add a new pair direction database somewhere in the UFO. For backwards compatibility, any pair defined in kerning.plist would be LTR unless defined as otherwise in the pair direction database. The potential problem with this is that a pair could only exist in one writing direction. I tried to think this through pretty thoroughly when I wrote the UFO 3 spec (I based it largely on the GPOS model) but I'm not a RTL expert by any means.
  3. Add a new rtlkerning.plist file. This would keep them separate, wouldn't break anything that works now, but it would be a new file. I considered adding a vkerning.plist file in UFO 3 for a similar purpose, but I didn't get any feedback on the proposal so I scrapped it.

So, that's where the thinking on this is on my side of things.

@davelab6
Copy link

davelab6 commented Jul 9, 2015 via email

@graphicore
Copy link
Author

Thanks Dave, I think that's what I'll do :-) But at the moment we have the Latin UFO that will have the Latin kerning, and a UFO that Kourosh produces with glyphs (from the TTF) where he kerns the Arabic. There are some cases like the parentheses from the Latin in Arabic where I am eager to see how it works out :-)

@LettError
Copy link
Contributor

Are there use cases in which a single pair <firstglyph><secondglyph> would need different values for different directions?

@graphicore
Copy link
Author

Not that I know, for a simple kerning pair. We need to know whether to produce:
LTR:

pos firstglyph secondglyph {value};

or RTL:

pos firstglyph secondglyph <{value} 0 {value} 0>;

But it's just one value.

@typesupply I think from your three proposals, the third proposal is the easiest to implement and the most robust of those three. The second one is worse than restructuring kerning data in UFO, especially because information that belongs together is stored at different places, so we spend our time syncing data then, it could make a good workaround however. The first one is also a possible workaround for the current situation.

@typesupply
Copy link
Contributor

The possibility of having different values when LTR and RTL seems illogical, but we know that crazy stuff exists (example: that very popular font that contains 2 glyphs with the same name; another one: those fonts that have two glyphs mapped to the same Unicode value). This could be tested by anyone with access to fonts containing both LTR and RTL kerning with a pretty simple fontTools script. It wouldn't be definitive, but it would be useful. I'll keep thinking about it.

I agree that redundant data is bad, but the nice thing about option 2 is that the data would be easily transferable between fonts. The data wouldn't have to be a 1:1 match to the contents of kerning.plist, it would be a general guide for separating the pairs. Still, it's a layer of complexity. UFO 4 is supposed to be only a structural change, so I'm hesitant to introduce an entirely new file. But, it may be necessary.

I did have an idea on the existing implementation issue, specifically the RoboFab API. (kerning.plist grew out of the RoboFab API, which is a flat dict or pairs.) RoboFab's __getitem__ and get methods could gain a new, optional, direction argument that defaults to "ltr". Existing scripts would continue to work without any change.

@behdad
Copy link

behdad commented Jul 9, 2015

Let me express my opinion again: the proposal here is wrong.

Kerning is an inherently visual procedure and is independent of the writing direction of the text. The key point here is that kerning.plist specifically and rightly encodes "left glyph" and "right glyph". So, for example, if someone is kerning the following sequence of two characters:

".

it absolutely doesn't matter if it's a double-quote followed by period, in a left-to-right script, or if it's period followed by double-quote in a right-to-left script.

Now. The reason Lasse and possibly others are confused is how the GPOS table works. Unfortunately individual lookups in OpenType do not specify what direction they expect their input in. That's something we (OpenType stakeholders) are fully aware of and planning to fix, but in the mean time, the reality is that lookups in the GPOS table are processed in the logical text order, whatever that means.

For the most common cases, that's not a problem and "logical text order" is unambiguous. For example, Latin is left-to-right, and Arabic is right-to-left. That logic is encoded here: https://github.com/behdad/harfbuzz/blob/master/src/hb-common.cc#L446

The implication for this bug then is this: when generating a kern lookup, if it is to be referenced from a right-to-left script system ('arab' for example), it has to be generated differently from a lookup that is to be referenced from a left-to-right script system ('latn' for example). You cannot use the same lookup.

Back to the ". example, since both of those punctuation marks can happen in Arabic as well as Latin, you need to encode them in kern lookups referenced from both 'arab' and 'latn', and the encoding is different. But as Erik very astutely asked, the value itself is not necessarily different and hence does not need to be reflected in the UFO file.

I'll go one step further with that: even if the desired kerning of the ". sequence was different between Arabic and Latin, that's more of a script-specific or language-specific distinction, not direction-specific. The same way that one might kern ". differently in French vs English, or in Cyrillic vs Latin. Again, nothing direction-specific.

Now, there's one case that is hugely ambiguous in OpenType: the case of kerning digits in Arabic and other right-to-left scripts, because digits in some of those scripts are written left-to-right. But I don't want to get into that discussion here (and how inconsistent implementations are).

I hope I've made my argument clear. Would be happy to clarify.

@graphicore
Copy link
Author

@behdad thanks for your input. If I read you right, you still don't tell us how the current way the data is organized in UFO makes it possible for us to determine the right output for our GPOS or AFDKO writing code.

The implication for this bug then is this: when generating a kern lookup, if it is to be referenced from a right-to-left script system ('arab' for example), it has to be generated differently from a lookup that is to be referenced from a left-to-right script system ('latn' for example). You cannot use the same lookup.

Separating via script-language instead via script-direction sounds totally sane for me, however, we still need that information to be present somewhere in UFO.

Following the proposals we heard, the best bet is probably different kerning.plist files. But separated by language: kerning-arab.plist, kerning-latn.plist etc. That would serve me well. Also we can then set the language tag for the features and until OpenType is fixed produce RTL-specific positioning lookups for Arabic script.

Now, there's one case that is hugely ambiguous in OpenType: the case of kerning digits in Arabic and other right-to-left scripts, because digits in some of those scripts are written left-to-right. But I don't want to get into that discussion here (and how inconsistent implementations are).

Thanks for pointing this out. But we need a solution for this problem here, we can't just not discuss it. You say we should use script language to gather the information needed to write the correct lookups, fine. But then you say this is broken when it comes to digits.
Should we use no language tag i.e. kerning.plist and there do the kerning of arabic numbers in a LTR fashion? Or is DFLT a better choice? Or do you imply we should implement the wisdom about those special conditions into the Arab feature writer? We could also just not kern these if it is to difficult to do in a consistent manner anyways i.e. FUBAR?

@behdad
Copy link

behdad commented Jul 9, 2015

@behdad thanks for your input. If I read you right, you still don't tell us how the current way the data is organized in UFO makes it possible for us to determine the right output for our GPOS or AFDKO writing code.

There's a lot of external data (from Unicode) that is needed to convert UFO to fonts, and this isn't any exception. Here's the simplest heuristic that works:

  • Generate both an LTR kern lookup and an RTL one; including all kerning pairs in both.
  • Reference the RTL lookup from script systems that are RTL, and the LTR one for others.

That's the crudest heuristic, but it works. From there on, you can add more complexity to achieve smaller kern tables. But none of these really belongs into the UFO per se. For example, a huge improvement:

  • Associate each glyph to a Unicode character,
  • Exclude from RTL kern table all glyphs associated with Unicode characters that have Bidi_Type=L,
  • Exclude from LTR kern table all glyphs associated with Unicode characters that have Bidi_Type=R or Bidi_Type=AL.

Now, there's one case that is hugely ambiguous in OpenType: the case of kerning digits in Arabic and other right-to-left scripts, because digits in some of those scripts are written left-to-right. But I don't want to get into that discussion here (and how inconsistent implementations are).
Thanks for pointing this out. But we need a solution for this problem here, we can't just not discuss it. You say we should use script language to gather the information needed to write the correct lookups, fine. But then you say this is broken when it comes to digits.
Should we use no language tag i.e. kerning.plist and there do the kerning of arabic numbers in a LTR fashion? Or is DFLT a better choice? Or do you imply we should implement the wisdom about those special conditions into the Arab feature writer? We could also just not kern these if it is to difficult to do in a consistent manner anyways i.e. FUBAR?

In short, HarfBuzz and Uniscribe handle those differently. Easiest is to ignore them for now. Or if you wish, remove them from the kern pairs. I can't recommend one way or another as the "right" way at this time.

@graphicore
Copy link
Author

Great, thanks! I will try.

@twardoch
Copy link

I might add that there is also vertical kerning, in both directions. I agree that the most common understanding of "kerning" should be "one-dimensional positive or negative single-value adjustment in a given direction between two entities, each of them being a glyph or a group". Contextual or 2D adjustments should be done in FEA directly, but for "kerning pairs", it would be nice to have an ability to declare the directionality, which can be:

  • horizontal in any direction
  • horizontal LTR
  • horizontal RTL
  • vertical in any direction
  • vertical TTB
  • vertical BTT

@davelab6
Copy link

davelab6 commented Sep 2, 2015

that very popular font that contains 2 glyphs with the same name

which font is that?

graphicore added a commit to Tarobish/Jomhuria that referenced this issue Sep 11, 2015
This implements what was discussed here unified-font-object/ufo-spec#16 (comment)

The result's have been used but are not yet approved by anyone :-)
This is the output file: https://github.com/Tarobish/Jomhuria/blob/master/sources/kerning.fea

@behdad wrote:

> * Generate both an LTR kern lookup and an RTL one; including all kerning pairs in both.
  * Reference the RTL lookup from script systems that are RTL, and the LTR one for others.
>
>That's the crudest heuristic, but it works. From there on, you can add more complexity to achieve smaller kern tables. But none of these really belongs into the UFO per se. For example, a huge improvement:
>
> * Associate each glyph to a Unicode character,
  * Exclude from RTL kern table all glyphs associated with Unicode characters that have Bidi_Type=L,
  * Exclude from LTR kern table all glyphs associated with Unicode characters that have Bidi_Type=R or Bidi_Type=AL.
@benkiel
Copy link
Contributor

benkiel commented May 5, 2016

Coming back to this, as we're thinking a bit more about UFO4, and it's something that's been on my mind.

I understand that @behdad is correct at a high level with this, but looking through the implementation of the feature writer, there is a lot to be argued for adding a direction tag (if not there, defaulting to ltr) to the kerning values. This would allow the designer to explicitly set pairs that they want written out in a direction, and avoid having to have complicated heuristics for unencoded glyphs just to write a kerning feature. It would also help kerning tools in displaying direction of a pair.

@behdad
Copy link

behdad commented May 5, 2016

@benkiel What's the direction of the following pair: ")/"?

@behdad
Copy link

behdad commented May 5, 2016

Also, there are plans to fix this in OpenType. So, I suggest we think this through more before changing the spec to accommodate for currently font format shortcomings.

@benkiel
Copy link
Contributor

benkiel commented May 5, 2016

@behdad I know, it's a mess. I'm glad to hear that it's being fixed in OpenType, but we may still need to think about this, as support for a new version of OpenType in existing engines will be slow.

If we add a tag, perhaps we define the following behavior for writing a kerning feature:
If the kerning has rtl, anything not tagged is written with both right to left and left to right lookups. Anything that is tagged ltr or rtl takes precedence, and is only written in the appropriate lookup.

This isn't as smart as your feature writing proposal, but it does have the advantage of capturing the designer's explicit intent.

@anthrotype
Copy link
Member

anthrotype commented Jan 30, 2018

Kerning is an inherently visual procedure and is independent of the writing direction of the text.

I agree.

The key point here is that kerning.plist specifically and rightly encodes "left glyph" and "right glyph".

That's not true, and that's probably the source of the confusion/misunderstandings about RTL kerning.

The "Writing Direction" paragraph from the UFO3 kerning.plist spec says:

The kerning data is writing direction neutral.

so far so good, it seems to agree with Behdad's comment above, i.e. the left is the left, the right is the right, no matter the script's writing direction. You have two shapes placed next to each other horizontally and you want to either reduce or increase the spacing between the two (by either changing the advance width of the left-hand-side glyph to move its right-sidebearing, or by changing both the advance width and the x-placement of the right-hand-side glyph to move its left-sidebearing).
However, the spec then continues on and (in my view) completely contradicts the previous point about writing direction neutrality:

For text written left-to-right, the left-most glyph is the key in the top level dictionary. For text written right-to-left, the right-most glyph is the key in the top level dictionary.

Now, if kerning were neutral and visual, the key in the top-level dictionary would not change depending on the writing direction, i.e. it would always be the left glyph or group.
Whereas according to the cited passage from the UFO3 spec, the key glyph/group in the kerning dictionary depends on the direction of the text (although this direction is not specified anywhere and that is the problem Lasse brought up). The terminology used for the kerning groups (first and second, instead of left and right as in the old MMK_ classes) also reinforces this point. Basically, it seems that the UFO kerning.plist follows the same "logical" text order in which OpenType lookups are processed.

The paragraph ends with an example which can be confusing at first, because the RTL case is not a real-world one (using for example two Arabic letters)

For example, given the pair LG, written left-to-right, the L is the key in the top dictionary and the G is the sub-dictionary. Given the pair GL, written right-to-left, the G is the key in the top dictionary and the L is the key in the sub-dictionary.

So... we need to revive this discussion and agreee on a fix. We either add some pair direction database to mark each pair as LTR only, RTL only or both, or we change the semantic of kerning.plist to truly be visual left and right.

@anthrotype
Copy link
Member

cc @behdad

@behdad
Copy link

behdad commented May 3, 2019

@benkiel @behdad I would like to see an example.

Example of what? A kern pair that doesn't have a "clear" direction? period/parenright

@schriftgestalt
Copy link

Yes.

@tiroj
Copy link

tiroj commented May 3, 2019

A kern pair that doesn't have a "clear" direction? period/parenright

The direction has to be determinable first from the text, based on the BiDi algorithm, and the punctuation rolled into glyph runs of appropriate directionality with adjacent glyphs, before we can even begin to address the question of how to make the lookup directionality clear. Do you think that's the case, Behdad: can we reliably determine what the direction of punctuation characters for OTL processing should be from BiDi (taking into account the complexities of embedding levels)?

@behdad
Copy link

behdad commented May 3, 2019

A kern pair that doesn't have a "clear" direction? period/parenright

The direction has to be determinable first from the text, based on the BiDi algorithm, and the punctuation rolled into glyph runs of appropriate directionality with adjacent glyphs, before we can even begin to address the question of how to make the lookup directionality clear. Do you think that's the case, Behdad: can we reliably determine what the direction of punctuation characters for OTL processing should be from BiDi (taking into account the complexities of embedding levels)?

Yes. But also means that the kern pair for punctuations should be encoded in all script systems in the font.

@anthrotype
Copy link
Member

anthrotype commented May 3, 2019

That's the problem. If kerning.plist contains such a pair between punctuation, e.g. (period, parenleft): -30, if one would like to encode this pair in both the kern lookup for LTR scripts and in the one for the RTL scripts, one needs to know (or make assumptions as to) what the original intention of the font designer was when kerning these two direction-neutral glyphs. If the UFO contained a new hkerning.plist (and vkerning.plist) which stored the kerning pairs not as first/second logical order (which is script-writing-direction depdendent), but as left and right glyph (or top and bottom, no matter what the text direction), then the compiler could write the same kerning pairs in both LTR and RTL.
E.g. let's pretend (period, parenleft): -30 in the example is meant to be kerned with period glyph on the left and parenleft on its right. Then for the LTR lookup, the generated rule would be pos period parenleft -30 (i.e. reduce period's advance width by 30 units); whereas in the RTL lookup it should be pos parenleft period <-30 0 -30 0> (move the left sidebearing of parenleft by reducing both its xAdvance and xPlacement by 30 units).

@tiroj
Copy link

tiroj commented May 3, 2019

Yes. But also means that the kern pair for punctuations should be encoded in all script systems in the font.

Agreed: that's standard practice here. When making multi-script fonts, I've tended to create separate kern sources for each script, so our FL-to-VOLT workflow can easily generate separate sets of lookups for each script, including punctuation kerning, numeral kerning, etc.. Of course, there's some punctuation characters that are never used with some scripts, so I also tend to remove kerning pairs for those characters from the sources for those scripts.

I don't know a robust way to automate splitting of kerning lookups by script, which is why I favour separate sources that I manage myself. One wants a kerning tool to be able to write the OTL lookups from whatever native format the tool uses, and to do so efficiently in terms of both data and layout processing, and to my mind that means controlling the separation of GPOS lookups that receive the kerning for different scripts. Since those lookups also need to include common punctuation glyphs — for kerning to the script and to each other, possibly in multiple directions —, keeping the sources separate has so far been the safest approach.

@typoman
Copy link

typoman commented May 3, 2019

Then for the LTR lookup, the generated rule would be pos period parenleft -30 (i.e. reduce period's advance width by 30 units); whereas in the RTL lookup it should be pos parenleft period <-30 0 -30 0> (move the left sidebearing of parenleft by reducing both its xAdvance and xPlacement by 30 units).

Does that mean Arabic numbers kerning which is also written in the RTL lookup should have the <-30 0 -30 0> format?

@behdad
Copy link

behdad commented May 3, 2019

That's the problem. If kerning.plist contains such a pair between punctuation, e.g. (period, parenleft): -30, if one would like to encode this pair in both the kern lookup for LTR scripts and in the one for the RTL scripts, one needs to know (or make assumptions as to) what the original intention of the font designer was when kerning these two direction-neutral glyphs.

Right.

UFO2 was fine technically, by using "left" and "right" words. UFO3 departed from this in the name of being direction-agnostic, by calling the glyphs in the pair "first" and "second", in practice making it inherently ambiguous.

@anthrotype
Copy link
Member

anthrotype commented May 3, 2019

Does that mean Arabic numbers kerning which is also written in the RTL lookup should have the <-30 0 -30 0> format?

no, that's the exception to the rule. They have "AN" bidirectional category which is weak LTR.

@tiroj
Copy link

tiroj commented May 3, 2019

@behdad I recall a conversation at the OTWG meeting in Mountain View in which we discussed the impact for kerning of Arab/Persian numerals in Harfbuzz's handling of Arabic layout vs Uniscribe's. My recollection is that if a font contains LTR kerning for numerals, presuming Uniscribe's treatment of numbers as LTR runs, this could be misinterpreted in Harfbuzz because of the way you apply RTL directionality across Arabic layout? Is that right?

@anthrotype
Copy link
Member

to clarify once again, Arabic number kerning should be written in the kern lookup for the RTL scripts, but using a LTR pos rule (only modifying advance of first glyph).

@justvanrossum
Copy link
Contributor

UFO2 was fine technically, by using "left" and "right" words. UFO3 departed from this in the name of being direction-agnostic, by calling the glyphs in the pair "first" and "second", in practice making it inherently ambiguous.

Yes, that was a mistake, and I think our challenge is to come up with a clean way to correct this. Adding a new hkerning.plist file would be a good solution. We're exploring some alternatives, though.

@behdad
Copy link

behdad commented May 3, 2019

@behdad I recall a conversation at the OTWG meeting in Mountain View in which we discussed the impact for kerning of Arab/Persian numerals in Harfbuzz's handling of Arabic layout vs Uniscribe's. My recollection is that if a font contains LTR kerning for numerals, presuming Uniscribe's treatment of numbers as LTR runs, this could be misinterpreted in Harfbuzz because of the way you apply RTL directionality across Arabic layout? Is that right?

Correct. HarfBuzz applies kerning in RTL to all script=Arabic text. Uniscribe on the other hand, uses a hack, by classifying Arabic numerals as a different ArabicNumeral script code, which has direction LTR.

to clarify once again, Arabic number kerning should be written in the kern lookup for the RTL scripts, but using a LTR pos rule (only modifying advance of first glyph).

That's what Uniscribe does. I wouldn't call it a rule though. It's a hack we have to live with.

@tiroj
Copy link

tiroj commented May 3, 2019

If glyphs are laid out left-to-right, it makes sense to me that kerning is applied left-to-right too. Applying kerning right-to-left for Arab/Persian numerals is really confusing, speaking as a font developer.

@behdad
Copy link

behdad commented May 3, 2019

If glyphs are laid out left-to-right, it makes sense to me that kerning is applied left-to-right too. Applying kerning right-to-left for Arab/Persian numerals is really confusing, speaking as a font developer.

(Why did you have to open that can of worms here...)

Right. But speaking as a shaper, when I see script=Arab direction=LTR, I don't know if it's numbers running their natural LTR, or letters forced to LTR visa LRO character. Short of looking at the character codepoints I have no way to distinguish.

@typoman
Copy link

typoman commented May 16, 2019

@behdad I'm a bit puzzled. Could you please elaborate why direction agnostic kerning pairs in the new UFO format is ambiguous if at the end all the pairs should be written in a logical order?

@justvanrossum
Copy link
Contributor

I'm a bit puzzled. Could you please elaborate why direction agnostic kerning pairs in the new UFO format is ambiguous if at the end all the pairs should be written in a logical order?

I'm not Behdad, but this is how I understand it: take a "neutral" kern pair, such as period/parenright. If the order in the kerning data is visual left-to-right, it is clear what is meant if you kern that pair, and you can compile it to LTR kerning and RTL kerning.

If it's supposed to follow the writing direction (as in the UFO3 spec), it's impossible to determine what the intended direction of the the pair was, as the glyphs in question have no inherent direction. So you simply can't tell whether "period parenright 40" was meant as visual "period parenright 40" or visual "parenright period 40". If the intention is not clear, how can you even compile it?

This seems to be confirmed by this quote from Cosimo's explanation:

There is no logical 'first' and 'second' when glyphs have no script (i.e. may occur in either LTR or RTL script) and have neutral bidi type. So we assume that by 'first' the designer means 'left' (it sucks.. This could be fixed by having that new hkerning.plist where glyphs are kerned in visual order, left and right, not logical first and second.

@anthrotype
Copy link
Member

correct. the same thing I wrote earlier in this thread:
#16 (comment)

@typoman
Copy link

typoman commented May 16, 2019

so hypothetically if the value in the pair was written in four int array but still in a logical order it would be fine?

@justvanrossum
Copy link
Contributor

so hypothetically if the value in the pair was written in four int array but still in a logical order it would be fine?

You need to separate what is (or can be/should be) stored in the UFO and what has to be compiled to GPOS. Those are two different things. The fact that RTL kerning is "difficult" in GPOS should have no influence on the UFO spec.

@typoman
Copy link

typoman commented May 16, 2019

I'm only trying to figure out how a kerning tool should store the data and how should it compile it in UFO. I'm very sorry if I get things slow, but since I've only understood the direction agnostic way of storing data, the idea of storing pairs in visual order is throwing me out of balance. Maybe examples will make it easy to implement:

UFO3 (kerning pair direction agnostic):

  • UFO pair period parenright 40
  • GPOS value record:
    RTL scripts: period parenright <40 0 40 0>
    LTR scripts: period parenright 40

UFO2 (kerning pair from left to right):

  • UFO pair period parenright 40
  • GPOS value record:
    RTL scripts: parenright period <40 0 40 0>
    LTR scripts: period parenright 40

Maybe the above example is wrongly written in the GPOS record. But If I'm correct, how UFO2 is a better approach?
Edit: made a correction on UFO3 example

@justvanrossum
Copy link
Contributor

Let's take this off line. Please email me.

@anthrotype
Copy link
Member

UFO3 kerning is not writing direction agnostic, it uses logical (first/second) order like in OpenType, thus depends on the script writing direction what is deemed "first" or "second", hence not writing direction agnostic.
UFO2 kerning was writing direction agnostic in that left and right are always left and right no matter from which perpective you look at it; it's not implying left is first, or that second is right -- to the contrary. I called it "visual" order but that may the source of the confusion. Call it maybe "natural" if you wish, as opposed to "logical". It simply means that two glyphs are placed next to each other and will be spaced more tightly or loosely when they occur next to each other no matter whetehr one comes first and the other second or viceversa

@anthrotype
Copy link
Member

if you believe that storing the left-most glyph in a pair on the left-hand side, and the right-most glyph on the right-hand side is enthocentrism... well, I'm ok to swap the order if that makes it more comfortable for you, I don't really care :)

@typesupply
Copy link
Contributor

I'm going to close this as we have a discussion about how to fix this in #96.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests