-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test to check encoding of message properties #6332
Conversation
@solth Looks good. Another idea would be to check the whole file with a regular expression containing only allowed characters for the respective language? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes for the property files look good. I suggest to apply them in a separate commit.
But what is the purpose of the new Java test code? Don't we want to catch any new contribution which uses a wrong encoding (!= UTF-8), no matter whether it is a property or any other non binary file? It's also sufficient to run the encoding test for files which are affected by a push or a pull request.
Would this workflow which checks all files in pull requests or push operations be better? |
You can use
So, you must at least use the parameter
You should even know how many bytes on default are read by file to determinate the encoding and file type. Quoting:
Only the first |
The main intention was to provide an automatic way to detect when the encoding of the property files has been changed to something that results in broken special characters being displayed in the frontend and thus would resolve #6331. The purpose was not to detect encoding changes to any source code files, as in my memory that didn't cause any otherwise undetected problems in the past (if the source files contain non-ASCII special characters - outside of String literals, for example in variable names - the whole compilation of the code will probably fail if the encoding is changed to something incompatible with UTF-8).
This is a good suggestion and would probably be a better alternative to my implementation, since it would cover the whole files and not just sample messages from them. Do you know if there are predefined regular expressions covering all valid characters for specific languages?
Yes, that is probably a better way to detect file encoding changes in any source file, and thus would cover #6324. |
Not really, no. I googled a bit and found this: LocaleData.getExemplarSet of the ICU4J library, which provides all valid characters of a language. But I have not used this library before. For German, English and Spanish it is probably not that difficult to specify regular expressions manually. |
PR #6341 adds a CI action which checks the whole file (all which are affected by a push or pull request). |
Resolves #6331
This PR adds a test that loads the German and Spanish message property files used in Kitodo.Production and checks if they their contents are encoded with
UTF-8
. This is done by loading specific messages known to contain unicode special characters and decode them withUTF-8
. The results are then compared with strings containing those special characters. The tests fail if the decoded message does not match the expected strings.This PR also sets the encoding of
Kitodo/src/main/resources/messages/password_es.properties
andKitodo/src/main/resources/messages/errors_de.properties
toUTF-8
and therefore fixes #6324Note: this PR does not yet replace all the codepoints like
\u00FC
with the actual special characters likeü
in the message files. That should be done in a follow-up pull request.