Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PropertyResourceBundle for localization #357

Merged
merged 3 commits into from
May 9, 2022

Conversation

timja
Copy link
Member

@timja timja commented May 6, 2022

Fixes #356

Prior art:
#105

This would allow us to switch to utf-8 encoding in the future for localization when Java 11 is the baseline.

I wasted far too much time confused on why my tests were failing, turns out my IDE automatically created the encoding in iso-8859-1, at least in IntelliJ this is configurable per project:
https://www.jetbrains.com/help/idea/properties-files.html#encoding

@timja timja requested review from basil, jglick and NotMyFault May 6, 2022 20:48
@jglick
Copy link
Member

jglick commented May 6, 2022

While I appreciate the sentiment, as in #105 (review) I am not sure this is prudent. Whoever added the ability to load *.properties in UTF-8 encoding neglected to define a way whereby a file could designate itself unambiguously as Unicode, for example with a BOM or a special file extension. Text editors may thus assume ISO-8859-1 encoding despite the project default, as for example NetBeans does, and show a Unicode file as mojibake. At runtime, https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/PropertyResourceBundle.html says

Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading.

This seems like a nice heuristic but since anything is valid ISO-8859-1 you can never be sure that it was meant to be UTF-8: we might load some old *.properties using Latin-1 characters which happen to be valid (but nonsensical) UTF-8, causing mojibake to be displayed on Java 11. Before committing to this technique I would advise scanning the corpus of *.properties in the @jenkinsci org looking for any file which is not ASCII and yet is valid UTF-8 by accident. We would have to switch to Unicode escapes in all such files, to ensure they are displayed properly in both Java 8 and 11; and could not switch them to raw UTF-8 until the plugin picked a core baseline requiring 11.

XML (#171) would be a safer choice, though it is awkward to edit. JSON would be fine, but does not support comments, which we need. YAML would solve that problem and be more familiar to people born after the 1970s, but require us to package a parser in Stapler, conflicting with the library wrapper plugin (unless we shaded it, which has its own problems).

@basil
Copy link
Member

basil commented May 6, 2022

Before committing to this technique I would advise scanning the corpus of *.properties in the https://github.com/jenkinsci org looking for any file which is not ASCII and yet is valid UTF-8 by accident.

I had some leftover spaghetti code from the Guava upgrade project to do searches like this, so I did a corpus search for .properties files that were both valid UTF-8 and valid ISO-8859-1 across all Jenkins and CloudBees plugins. I only found matches in 3 plugins, all maintained by @ikedam (the author of #105) and all in Japanese .properties files:

WEB-INF/classes/hudson/plugins/doclinks/artifacts/ArtifactsDocLinksConfig/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/hudson/plugins/doclinks/artifacts/ArtifactsDocLinksPublisher/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/hudson/plugins/doclinks/artifacts/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/groovy_label_assignment/GroovyLabelAssignmentAction/summary_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/groovy_label_assignment/GroovyLabelAssignmentProperty/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/groovy_label_assignment/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/preferences/BuildPreference/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/preferences/BuildPreferenceJobProperty/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/preferences/BuildPreferenceNodeProperty/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/preferences/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/rules/BuildResultScoringRule/config_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/rules/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/ScoringLoadBalancer/global_ja.properties is valid UTF-8 and valid ISO-8859-1
WEB-INF/classes/jp/ikedam/jenkins/plugins/scoringloadbalancer/util/Messages_ja.properties is valid UTF-8 and valid ISO-8859-1

Here are the links to the plugins:

These plugins haven't had any substantive commits or releases since 2016, so I think we can deprecate them and proceed with this change.

@basil
Copy link
Member

basil commented May 6, 2022

Even the results reported above appear to be false positives, since they are using Unicode escapes but just happen to contain comments with the unescaped text. In short I do not think compatibility is a legitimate concern.

Copy link
Member

@basil basil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! As mentioned in the preceding comments, I do not think compatibility is a legitimate concern here.

@ikedam
Copy link
Member

ikedam commented May 7, 2022

I only found matches in 3 plugins, all maintained by @ikedam (the author of #105) and all in Japanese .properties files

UTF8 letters are only in comments (for better experience in review) and I believe it won't cause mojibake.
Anyway, we can ignore those plugins for this context.

@timja
Copy link
Member Author

timja commented May 7, 2022

Text editors may thus assume ISO-8859-1 encoding despite the project default, as for example NetBeans does, and show a Unicode file as mojibake

ha well I've found the issue you've commented all over, https://bz.apache.org/netbeans/show_bug.cgi?id=75906
Seems somewhat 'fixed', you can mark individual files as utf-8 or select multiple and do it. Not as nice as the IntelliJ setting although it's more flexible to a degree.


Huge thanks to @basil for doing the scan of the org

Copy link
Member

@NotMyFault NotMyFault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for taking care of it, Tim!

@basil
Copy link
Member

basil commented May 8, 2022

@basil basil changed the title Use PropertyResourceBundle for localization Use PropertyResourceBundle for localization May 8, 2022
@timja
Copy link
Member Author

timja commented May 9, 2022

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If compatibility is not a major concern then OK, may as well do it this way.

My understanding is that when updating a plugin to a baseline requiring Java 11 then you may—but need not—switch *.properties to UTF-8.

It would be possible to add an hpi: mojo to check your bundles vs. your Java version and advise if you should switch, but I am not sure it would be worth the effort (and overhead on every build).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use PropertyResourceBundle to support UTF-8 on Java 9+
5 participants