Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-subroutinizing SourceSansPro-Regular.otf yields slightly bigger file #9

Open
anthrotype opened this issue Jun 1, 2020 · 13 comments
Assignees

Comments

@anthrotype
Copy link
Member

I tried downloading SourceSansPro-Regular.otf and run python -m cffsubr on it.
Comparing the resulting CFF table, I see that the original table is smaller than the one produced after running cffsubr.

I was wondering why this is the case?
Is SourceSansPro-Regular using some different library to do the subroutinization than the one used by tx tool? Or is it passing different options that I am not aware of?

How do you explain the diff?

@anthrotype
Copy link
Member Author

anthrotype commented Jun 1, 2020

/cc @khaledhosny who is our only known user so far

@khaledhosny
Copy link
Contributor

Source Sans Pro uses makeotf, so it might be difference from makeotf subroutinizer.

@josh-hadley
Copy link
Contributor

I'm looking into this so I can explain in more detail but basically what @khaledhosny is the root of it: Source Sans Pro was built with makeotf and although the core subroutinization code is more or less identical between makeotfexe and tx, the result of subroutinization can be affected by the order in which the input glyphs are processed as part of subroutinization (not necessarily font glyph order -- it's the order in which the glyphs are analyzed for subroutinization).

On a related note: I built the latest SSP (from master branch) with the current AFDKO/makeotf, then ran the result of that through cffsubr. In that case, the original table is slightly larger than the one produced by cffsubr.

@cjchapman
Copy link

cjchapman commented Jun 2, 2020

Historical note: I ported tx's faster subroutinizer code to makeotf in AFDKO PR #882, which went out in AFDKO 3.0.0. Prior to that, makeotf used different subroutinizer code.

@cjchapman
Copy link

cjchapman commented Jun 2, 2020

As Josh mentioned, the tx and makeotf subroutinizers are nearly identical (since AFDKO 3.0.0). If anyone wants to compare them, they are in these two files:

@anthrotype
Copy link
Member Author

Thank you for the insights. In cffsubr I am calling tx with the option to keep the order, I thought it would be required to ensure that I can then reinsert the modified cff table back in the sfnt container. But maybe that's not the case and I can let tx find the most optimal order?

@cjchapman
Copy link

You definitely need the +b "preserve glyph order" option for tx, otherwise you'll have problems like cmap mapping Unicodes to the wrong glyph indices.

@josh-hadley
Copy link
Contributor

Yeah, what @cjchapman said. Don't remove +b unless you want a whole new set of problems to deal with (and likely still have size differences from makeotf).

To reiterate: the difference in the subroutinization result seems to be caused by the order in which glyphs are analyzed for subroutinization -- which is not necessarily the font's glyph order (I'm working up a test case to demonstrate/prove/describe this in detail).

@josh-hadley
Copy link
Contributor

Some additional information and data for this:

  1. tx appears to perform the analysis for subroutinization based on the Adobe Standard Encoding order (that is: it looks at those glyphs, if present, first), regardless of the font's glyph order.
  2. makeotfexe probably performs the analysis based on glyph order
  3. Source Sans Pro is not in Adobe Standard Encoding order
  4. When the analysis order is the same, you get the same subroutinization results. Therefore, we can say conclusively that the core makeotfexe subroutinization is the same as tx. It's the analysis order that contributes to the differences you see.

I've attached some test files that support the above (I did not bother trying to prove makeotfexe's behavior exhaustively as I think it can be inferred from the other findings). The file SSP-limited.otf is SourceSansPro built with makeotfexe using a modified GOADB, containing mostly glyphs not in Adobe Standard Encoding. SSP-limited-tx.otf is the result of running tx -cff +S +b on SSP-limited.otf, then stuffing the resulting CFF table into the file. There are still some slight differences in the CFF table, but the subrs and glyph charstrings are identical.

To sum up: the difference in this particular case is because the font's glyph set is not in Adobe Standard Encoding order. makeotfexe (apparently) analyzes for subroutinization based on font glyph order, whereas tx analyzes based on Adobe Standard Encoding order. If you want the same subroutinization results between makeotfexe and tx, you need to have the font's glyphs in Adobe Standard Encoding order.

SSP-limited-test.zip

@anthrotype
Copy link
Member Author

Thanks for the analysis.

tx appears to perform the analysis for subroutinization based on the Adobe Standard Encoding order...
makeotfexe probably performs the analysis based on glyph order

Any particular reasons why tx and makeotf should differ in this regard?

@anthrotype
Copy link
Member Author

There are still some slight differences in the CFF table

I noticed those too. In paticular, tx seems to use "ExpertEncoding", whereas makeotf prefers "StandardEncoding". I don't know what that means, but I wonder if that has any relationship with that difference you noticed in subroutinization order.

Another difference is that tx is dropping FamilyBlues and FamilyOtherBlues, apparently in this font these have the same values as the BlueValues and OtherBlues respectively. Maybe tx thinks they are redundant and flushes them away? Is that good/safe?

@miguelsousa
Copy link
Member

If FamilyBlues and FamilyOtherBlues have the same values as BlueValues and OtherBlues, it's correct to not include the Family set in the font.

As for the usage of ExpertEncoding, I find that strange. Not sure what consequences it may have.

@josh-hadley
Copy link
Contributor

Any particular reasons why tx and makeotf should differ in this regard?

I don't know for sure; this all happened way before my time at Adobe and I think the people who made those decisions are not around anymore. My guess would be because Adobe Standard Encoding provided a consistent starting point for compressing the character/glyph sets that were popular at the time this scheme was developed (in tx, anyway).

As for the usage of ExpertEncoding, I find that strange. Not sure what consequences it may have.

I suspect that's an anomaly from the somewhat unusual set that I chose for this experiment. Probably tx uses some heuristic like % of charset present to set "StandardEncoding" and this font has a very small percentage of that. I would not expect to see this change in more normal cases. And I'm not sure it has any real consequences in an OpenType font anyway: the font's 'cmap' subtables will ultimately determine the character set(s) and encoding(s).

Bringing all of this back around to cffsubr: to really solve this well I think we need to build only the subroutinizer (from code that @cjchapman mentions here) into either a standalone executable or C Extension so we truly isolate subroutinization from other parts of the CFF/CFF2 tables.

A shorter-term workaround might be to see if we can extract only the relevant data from the tx-subroutinized CFF table (local & global subrs, charstrings, maybe some other bits) and stuff those back into the pre-subroutinized CFF, rather than take the entire converted table which might have other undesired diffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants