Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSE - Can we internationalize .html files without requiring any special markup? #7

Open
5 tasks done
bobbingwide opened this issue Nov 26, 2020 · 7 comments
Open
5 tasks done
Assignees
Labels
enhancement Gutenberg Required for the WordPress block editor - Gutenberg help wanted

Comments

@bobbingwide
Copy link
Owner

bobbingwide commented Nov 26, 2020

In Gutenberg Full Site Editing (FSE) the current proposal is to deliver a theme's template and template parts as .html files.
But there's a problem.

Question. How does one go about internationalizing and localizing HTML?

This leads to further questions.

  • How easy is it to spot translatable text, and to replace it with a translated version?
  • Should this be done at run time?
  • Or can translations be delivered automatically as language files?
  • What are the exceptions and how do we overcome them?

I've not looked at all of the Gutenberg issues related to this.
I just want to experiment with a (simple) proposal that involves:

  • parsing the HTML
  • identifying the text strings
  • identifying the translations
  • applying them to the parsed HTML
  • saving the HTML as a language version

Proposed solution

Stage 1. Extract translatable strings from HTML

Using PHP's DOMDocument and associated classes and methods it should be possible to identify all translatable strings in any HTML and extract them into a .pot file format.

Stage 2. Extract strings from theme's templates and template parts to a .pot file

Using Gutenberg's block parser ( class WP_Block_Parser ) we can parse blocks and pass the innerHTML to the HTML string extraction routine and a extract translatable attributes from attrs.

Process all templates and template parts to produce a single theme .pot file: theme-FSE.pot
Note: This will be a different file from the .pot file created by parsing the PHP files.

Stage 3. Translate into local language

Use a similar solution to l10n generate en_GB and bb_BB .po and .mo files.

Stage 4. Apply the local language

Using the .mo files generated after translation apply the translations to the templates and template parts
saving the new files in a language specific directory.

Load the target text domain for the theme.
For each template or template part
   For each block with innerHTML
     Parse the HTML to find the strings
       Lookup the string in the table 
       Apply the string
      return the new HTML
   rebuild the block
Rebuild the template  or template part

Stage 5. Load the templates and template parts for the user's locale

This is where we'll have to change Gutenberg's template and template part loading logic.

If there are language files for the theme and the user's locale use these when loading .html files.

Assumptions

  • Translatable strings do not need to be explicitely wrapped.
  • Non translatable strings can be wrapped with translate="no".
  • Some tag's attributes are translatable.
  • Gutenberg can be used to parse the blocks.
  • Another parser can be used to parse the HTML.

Scope and Exclusions

  • Two matching strings will be translated in the same way.
  • If translator's context is required this can be identified by HTML attributes.
  • Certain tags, such as script and style, are not expected so will not be processed.
  • Only process template and template part files.
  • Does not handle generalization of URLs.
@bobbingwide bobbingwide added enhancement help wanted Gutenberg Required for the WordPress block editor - Gutenberg labels Nov 26, 2020
@bobbingwide bobbingwide self-assigned this Nov 26, 2020
@bobbingwide
Copy link
Owner Author

bobbingwide commented Nov 26, 2020

I'm going to try using the simple HTML dom parser from https://sourceforge.net/projects/simplehtmldom/files/latest/download
Perhaps an easier approach would be to write it in JavaScript using https://www.npmjs.com/package/html-dom-parser

Note: I wrote this after I'd failed to get the DOMDocument parser to find certain tags. I've since realised my mistake and have reverted to using DOMDocument. See #7 (comment)

@bobbingwide
Copy link
Owner Author

bobbingwide commented Nov 26, 2020

I found some very useful documentation about HTML's translate attribute.

https://www.w3.org/International/questions/qa-translate-flag.en

Basically, for any text you don't want translated you wrap it in an element with translate="no"
and if you want to explicitely identify something to be translated it's wrapped in an element with translate="yes".

There are quite a few other details in this document and links. Something for another time.

@bobbingwide
Copy link
Owner Author

Prior to creating this issue I briefy played with PHP's DOMDocument class.
After a while I realised it only worked for valid XML, not HTML;
Self closing tags such as and
were being ignored.

I tried the simple_html_dom routine but it couldn't nicely handle:

  • break tags
  • links in paragraphs
  • divs

So I revisited the DOMDocument route as other plugins were happily using it. e.g. Jetpack ( class.jetpack-post-images.php ).

I realised I wasn't handling the nodes correctly; I was missing else logic for when the $node->nodeValue was empty.
So now I'm going to revert to using DOMDocument.

I'll have to cater for Warnings when DOMDocument encounters tags it can't handle.
Extract from Jetpack's code.

// The @ is not enough to suppress errors when dealing with libxml,
// we have to tell it directly how we want to handle errors.
libxml_use_internal_errors( true );
@$dom_doc->loadHTML( $html_info['html'] );
libxml_use_internal_errors( false );

@bobbingwide
Copy link
Owner Author

bobbingwide commented Nov 27, 2020

I've not looked at all of the Gutenberg issues related to this.

Actually, I have had a cursory glance.

I won't reference the issues directly until I've made enough progress in Stage 4. Here are some of the relevant issue numbers:

20966 - Block Based Themes: Dynamic values in static HTML theme file

This is more to do with values which vary between sites: URLs, post IDs etc than text strings

21204 - How will translations be handled in block based themes?

21728 - Discuss: Contextual block behavior

21932 - Inline Dynamic Content Solutions

I think in most cases the solution is being overthought.

In my view there are two distinct challenges:

  1. Text which needs to be translated.
  2. Hardcoded values which need to be generalized.

For this work I'm only concerned with the i18n/l10n part.

My premises are:

  • templates and template parts will contain text which is easily identifiable and translatable.
  • This can be performed in an online or batch process.
  • It can be done statically, not on the fly.
  • There shouldn't be a need to use complicated markup
  • Though no translatable strings can be identified using translate="no" attributes.

bobbingwide added a commit that referenced this issue Nov 28, 2020
bobbingwide added a commit that referenced this issue Nov 28, 2020
bobbingwide added a commit that referenced this issue Nov 28, 2020
…utput file to the theme's languages folder
bobbingwide added a commit that referenced this issue Nov 29, 2020
bobbingwide added a commit that referenced this issue Nov 29, 2020
bobbingwide added a commit that referenced this issue Nov 29, 2020
bobbingwide added a commit that referenced this issue Nov 29, 2020
@bobbingwide
Copy link
Owner Author

Stage 5. Load the templates and template parts for the user's locale.

We can implement a local solution for the Fizzie theme that doesn't require changes to Gutenberg
with the following assumptions.

  • The block-template files only contain non-translatable content
  • The block-template-parts files contain translatable content which can be translated at the lowest node level.
  • The logic implemented to re-apply blank space will be OK for LTR SBCS target languages.
  • Shortcodes do not need translating
  • Shortcodes will eventually need to be converted to blocks
  • The set of block attributes that need translating can be hardcoded
  • Block attributes that need translating are at the top level.
  • Block template part files will not be customised in the editor.
  • The theme's .PHP files don't need to be translated: .pot file name clash.

The local solution can be implemented in fizzie_load_template_part().

@bobbingwide
Copy link
Owner Author

@bobbingwide
Copy link
Owner Author

While updating Fizzie for WordPress 6.2 and Gutenberg 15.3.1 I briefly considered how much of the internationalization logic still worked. It seems that t10n.bat needs updating to reflect the fact that I've moved the files block-template-parts to parts and block-templates to templates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Gutenberg Required for the WordPress block editor - Gutenberg help wanted
Projects
None yet
Development

No branches or pull requests

1 participant