Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump JSP to Unicode 14 #93

Merged
merged 8 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions UnicodeJsps/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
For ICU versions, see https://github.com/orgs/unicode-org/packages?repo_name=icu
Note that we can't use the general ICU maven packages, because utilities isn't exported (yet).
-->
<icu.version>69.1-SNAPSHOT-cldr-2021-02-17</icu.version>
<icu.version>70.0.1-SNAPSHOT-cldr-2021-06-15</icu.version>

<!--
For CLDR versions, see https://github.com/orgs/unicode-org/packages?repo_name=cldr
Expand Down Expand Up @@ -71,19 +71,19 @@
<scope>provided</scope>
<version>${jsp.version}</version>
</dependency>

<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>

<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
Expand Down
8 changes: 4 additions & 4 deletions UnicodeJsps/src/main/java/org/unicode/jsp/CachedProps.java
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
import com.ibm.icu.util.VersionInfo;

public class CachedProps {
public static final boolean IS_BETA = false;
public static final boolean IS_BETA = true;

public static final Splitter HASH_SPLITTER = Splitter.on('#').trimResults();
public static final Splitter SEMI_SPLITTER = Splitter.on(';').trimResults();
Expand All @@ -44,7 +44,7 @@ public class CachedProps {
final BiMultimap<String,String> nameToAliases = new BiMultimap<String,String>(null,null);
final Map<String,BiMultimap<String,String>> nameToValueToAliases = new LinkedHashMap();

static CachedProps CACHED_PROPS = getInstance(VersionInfo.getInstance(12));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised at this; if based on master it would be replacing 13 by 14.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug? I hadn't changed this value. Should it be calculated ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be the version of the beta props. I think it is built that way so that it doesn't pull in the BIN properties if BETA is off. For now, let's just leave it at 14, but file an issue.

static CachedProps CACHED_PROPS = getInstance(VersionInfo.getInstance(14));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be driven by the version string in class Settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more complicated than that because of the interplay with the beta flag. I suggested that we go with 14 for now, and file an issue. We don't want to wait on the fuller solution.


static UnicodeProperty NAMES = CachedProps.CACHED_PROPS.getProperty("Name");

Expand Down Expand Up @@ -144,8 +144,8 @@ class DelayedUnicodeProperty extends UnicodeProperty {
private List<String> nameAliases;
private Multimap<String,String> valueToAliases;

public DelayedUnicodeProperty(VersionInfo version, String propName,
Collection<String> nameAliases,
public DelayedUnicodeProperty(VersionInfo version, String propName,
Collection<String> nameAliases,
BiMultimap<String, String> biMultimap) {
this.version = version;
Collection<String> temp;
Expand Down
112 changes: 86 additions & 26 deletions UnicodeJsps/src/main/java/org/unicode/jsp/ScriptTester.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
import java.util.Set;
import java.util.TreeMap;
import java.util.TreeSet;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.logging.Logger;
import java.util.regex.Pattern;

import com.ibm.icu.dev.util.CollectionUtilities;
Expand All @@ -24,25 +27,70 @@
* @author markdavis
*/
public class ScriptTester {
static Logger logger = Logger.getLogger(ScriptTester.class.getName());
private final UnicodeMap<BitSet> character_compatibleScripts;


public enum CompatibilityLevel {Highly_Restrictive, Moderately_Restrictive}
public enum ScriptSpecials {on, off}


/**
* Space reserved for script codes not in ICU
*/
public static final int EXTRA_COUNT = 16; // should be enough, hard working as UTC is!
public static final Map<String,Integer> extraScripts = new ConcurrentHashMap<>(EXTRA_COUNT);
/**
* Extended scripts; note that they do not have stable numbers, and should not be persisted.
*/
public static final int
public static final int
//HANT = UScript.CODE_LIMIT,
//HANS = HANT + 1,
LIMIT = UScript.CODE_LIMIT; // HANS + 1;

private static String[][] EXTENDED_NAME = {{"Hant", "Han Traditional"}, {"Hans", "Han Simplified"}};
LIMIT = UScript.CODE_LIMIT + EXTRA_COUNT; // HANS + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the comments about HANT/HANS -- since we have real UScript constants for them.


private static String[][] EXTENDED_NAME = {
// Scripts without stable numbers
{"Hant", "Han Traditional"}, {"Hans", "Han Simplified"},
};
Comment on lines +52 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these? ICU has UScript.SIMPLIFIED_HAN and UScript.TRADITIONAL_HAN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.


static AtomicInteger scriptCounter = new AtomicInteger(UScript.CODE_LIMIT);

static int getScriptCode(String script) {
try {
// If ICU has it, great
return UCharacter.getPropertyValueEnum(UProperty.SCRIPT, script);
} catch (com.ibm.icu.impl.IllegalIcuArgumentException iiae) {
// Make something up
int newCode = extraScripts.computeIfAbsent(script, script2 -> {
int i = scriptCounter.getAndIncrement();
logger.warning("Synthesized scriptCode " + i + " for unrecognized script extension '"+script+"'");
return i;
});
// Verify we didn't run over
if (newCode >= LIMIT) {
Comment on lines +69 to +70
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a hard limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have more than 'extrascripts' scripts, yes

throw new RuntimeException("computed script code of " + newCode + " for '"+script+"' overflows: have " + extraScripts.size() +
" scripts but EXTRA_COUNT=" + EXTRA_COUNT);
}
return newCode;
}
}

public static String getScriptName(int extendedScriptCode, int choice) {
if (extendedScriptCode >= UScript.CODE_LIMIT) {
return EXTENDED_NAME[extendedScriptCode - UScript.CODE_LIMIT][choice];
if (extendedScriptCode >= LIMIT) {
return EXTENDED_NAME[extendedScriptCode - LIMIT][choice];
} else {
for (Map.Entry<String, Integer> e : extraScripts.entrySet()) {
if(e.getValue() == extendedScriptCode) {
if(choice == 0) {
return e.getKey();
} else {
return "New Script '"+ e.getKey() + "'";
}
}
}
throw new IllegalArgumentException("Unknown extended script code " + extendedScriptCode);
}
}
return UCharacter.getPropertyValueName(UProperty.SCRIPT, extendedScriptCode, choice);
}
Expand Down Expand Up @@ -128,12 +176,12 @@ public boolean isOk(CharSequence input) {
// check numbers
return true;
}



// TODO, cache results
private BitSet getActualScripts(int cp) {
BitSet actualScripts = scriptSpecials.get(cp);
BitSet actualScripts = getScriptSpecials().get(cp);
if (actualScripts == null) {
actualScripts = new BitSet(LIMIT);
int script = UCharacter.getIntPropertyValue(cp, UProperty.SCRIPT);
Expand All @@ -143,7 +191,7 @@ private BitSet getActualScripts(int cp) {
}

public boolean filterTable(List<Set<String>> table) {

// We make one pass forward and one backward, finding if each characters scripts
// are compatible with the ones before.
// We then make a second pass for the ones after.
Expand Down Expand Up @@ -248,7 +296,7 @@ private boolean contains(BitSet set1, BitSet set2) {
}

public static class ScriptExtensions {

public static final Comparator<BitSet> COMPARATOR = new Comparator<BitSet>() {

public int compare(BitSet o1, BitSet o2) {
Expand All @@ -260,13 +308,13 @@ public int compare(BitSet o1, BitSet o2) {
return n1.compareToIgnoreCase(n2);
}
};

private UnicodeMap<BitSet> scriptSpecials;

public Collection<BitSet> getAvailableValues() {
return scriptSpecials.getAvailableValues();
}

public UnicodeSet getSet(BitSet value) {
return scriptSpecials.getSet(value);
}
Expand All @@ -279,21 +327,21 @@ private static class MyHandler extends FileUtilities.SemiFileReader {
public boolean handleLine(int start, int end, String[] items) {
BitSet bitSet = new BitSet(LIMIT);
for (String script : SPACES.split(items[1])) {
int scriptCode = UCharacter.getPropertyValueEnum(UProperty.SCRIPT, script);
int scriptCode = getScriptCode(script);
bitSet.set(scriptCode);
}
map.putAll(start, end, bitSet);
return true;
}
}

public static ScriptExtensions make(String directory, String filename) {
ScriptExtensions result = new ScriptExtensions();
result.scriptSpecials = ((MyHandler) new MyHandler()
.process(directory, filename)).map.freeze();
return result;
}

public static ScriptExtensions make(Class aClass, String filename) {
ScriptExtensions result = new ScriptExtensions();
result.scriptSpecials = ((MyHandler) new MyHandler()
Expand All @@ -312,7 +360,7 @@ public void putAllInto(UnicodeMap<BitSet> char2scripts) {
public static String getNames(BitSet value, int choice, String separator) {
return getNames(value, choice, separator, new TreeSet<String>());
}

public static String getNames(BitSet value, int choice, String separator, Set<String> names) {
names.clear();
for (int i = value.nextSetBit(0); i >= 0; i = value.nextSetBit(i+1)) {
Expand All @@ -321,12 +369,24 @@ public static String getNames(BitSet value, int choice, String separator, Set<St
return CollectionUtilities.join(names, separator).toString();
}
}

static ScriptExtensions scriptSpecials = ScriptExtensions.make(ScriptExtensions.class, "ScriptExtensions.txt");

static final class ScriptExtensionsHelper {
ScriptExtensions scriptSpecials;

ScriptExtensionsHelper() {
scriptSpecials = ScriptExtensions.make(ScriptExtensions.class, "ScriptExtensions.txt");
}

static ScriptExtensionsHelper INSTANCE = new ScriptExtensionsHelper();
}

static final ScriptExtensions getScriptSpecials() {
return ScriptExtensionsHelper.INSTANCE.scriptSpecials;
}

public static BitSet getScriptSpecials(int codepoint) {
BitSet output = new BitSet(LIMIT);
BitSet actualScripts = scriptSpecials.get(codepoint);
BitSet actualScripts = getScriptSpecials().get(codepoint);
if (actualScripts != null) {
output.or(actualScripts);
} else {
Expand All @@ -340,14 +400,14 @@ public static UnicodeMap<String> getScriptSpecialsNames() {
UnicodeMap<String> result = new UnicodeMap<String>();
Set<String> names = new TreeSet<String>(); // to alphabetize

for (BitSet value : scriptSpecials.getAvailableValues()) {
result.putAll(scriptSpecials.getSet(value), ScriptExtensions.getNames(value, UProperty.NameChoice.LONG, ",", names));
for (BitSet value : getScriptSpecials().getAvailableValues()) {
result.putAll(getScriptSpecials().getSet(value), ScriptExtensions.getNames(value, UProperty.NameChoice.LONG, ",", names));
}
return result;
}

public static String[][] getScriptSpecialsAlternates() {
Collection<BitSet> availableValues = scriptSpecials.getAvailableValues();
Collection<BitSet> availableValues = getScriptSpecials().getAvailableValues();
String[][] result = new String[availableValues.size()][];
Set<String> names = new TreeSet<String>(); // to alphabetize

Expand Down Expand Up @@ -387,7 +447,7 @@ private Builder(CompatibilityLevel level, ScriptSpecials specials) {
addCompatible(UScript.LATIN, i);
}
// FALL THRU!
case Highly_Restrictive:
case Highly_Restrictive:
addCompatible(UScript.LATIN, UScript.HAN, UScript.HIRAGANA, UScript.KATAKANA);
//addCompatible(UScript.LATIN, HANT, UScript.HIRAGANA, UScript.KATAKANA);
//addCompatible(UScript.LATIN, HANS, UScript.HIRAGANA, UScript.KATAKANA);
Expand All @@ -413,7 +473,7 @@ private Builder(CompatibilityLevel level, ScriptSpecials specials) {
// fix the char2scripts mapping

if (specials == ScriptSpecials.on){
scriptSpecials.putAllInto(char2scripts);
getScriptSpecials().putAllInto(char2scripts);
}
}

Expand Down
Loading