-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 639 Check diffs between TUP and IUP #640
Issue 639 Check diffs between TUP and IUP #640
Conversation
Raw data for #639 is at https://docs.google.com/spreadsheets/d/1mFpC03q_dBNXuVRA_qJ_tHSjmVQY8dIDC4PYRkZW1KA
|
worked out the kinks, and got normalization to use the newer parsed data. Would like to merge this before continuing on. |
import org.unicode.text.UCD.Normalizer.NormalizationFormat; | ||
import org.unicode.text.utility.Utility; | ||
|
||
public class ShimUnicodePropertyFactory extends UnicodeProperty.Factory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a comment explaining what it is for (as far as I can tell it makes an IUP behave like TUP for testing/diffing purposes, but we shouldn’t actually be used beyond that, right?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -32,18 +35,53 @@ public final class Normalizer implements Transform<String, String>, UCD_Types { | |||
public static final String copyright = | |||
"Copyright (C) 2000, IBM Corp. and others. All Rights Reserved."; | |||
|
|||
public enum NormalizationFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just NormalizationForm?
From my experiments with #502, it also looks like IUP has # rather than the code point for CJKV ideographs etc. |
Yes, it uses the same convention for names that the XML does for names. There is a method that resolves that (IndexUnicodeProperties.getName(), also a method that gets a sequence of names for a string. We could rethink the property to have the fully resolved name if we want. The only reason for the # is just to avoid the storage cost, but that may not be an issue now. |
Add a test of the differences as per #639