You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way it is generated now (random output from fst) makes it contain all sorts of random noise (over generation patterns that are usually harmless, but turns out to be really harmful in this context).
Use the weighted fst (do not convert to unweighted), add heavy weights to tags for all unwanted strings, then filter the output based on weight (ie only output with weight below threshold should survive).
Requires that the wordlist is printed with weights, or that we remove such paths from the fst first, whatever is more easily implemented.
Another alternative: add more paths to be removed from the lexicon - we don't need acronyms and abbreviations in the hyphenator lexicon (they will be covered by the rule component). The same goes for numbers.
We already do this, so this is definitely the easiest way forward.
This issue was created automatically with bugzilla2github
Bugzilla Bug 2537
Date: 2019-01-25T09:31:29+01:00
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>
To: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>
CC: borre.gaup, chiara.argese
Last updated: 2019-01-25T09:42:51+01:00
The text was updated successfully, but these errors were encountered: