Filter out numerals, acronyms etc from word list for pattern hyhpenation (Bugzilla Bug 2537) #35

albbas · 2019-01-25T08:31:29Z

This issue was created automatically with bugzilla2github

Bugzilla Bug 2537

Date: 2019-01-25T09:31:29+01:00
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>
To: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>
CC: borre.gaup, chiara.argese

Last updated: 2019-01-25T09:42:51+01:00

albbas · 2019-01-25T08:31:29Z

Comment 13127

Date: 2019-01-25 09:31:29 +0100
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>

The way it is generated now (random output from fst) makes it contain all sorts of random noise (over generation patterns that are usually harmless, but turns out to be really harmful in this context).

albbas · 2019-01-25T08:39:27Z

Comment 13128

Date: 2019-01-25 09:39:27 +0100
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>

Use the weighted fst (do not convert to unweighted), add heavy weights to tags for all unwanted strings, then filter the output based on weight (ie only output with weight below threshold should survive).

Requires that the wordlist is printed with weights, or that we remove such paths from the fst first, whatever is more easily implemented.

albbas · 2019-01-25T08:42:51Z

Comment 13129

Date: 2019-01-25 09:42:51 +0100
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>

Another alternative: add more paths to be removed from the lexicon - we don't need acronyms and abbreviations in the hyphenator lexicon (they will be covered by the rule component). The same goes for numbers.

We already do this, so this is definitely the easiest way forward.

albbas transferred this issue from giellalt/bugzilla-dummy Sep 10, 2024

albbas assigned snomos and flammie Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out numerals, acronyms etc from word list for pattern hyhpenation (Bugzilla Bug 2537) #35

Filter out numerals, acronyms etc from word list for pattern hyhpenation (Bugzilla Bug 2537) #35

albbas commented Jan 25, 2019

albbas commented Jan 25, 2019

albbas commented Jan 25, 2019

albbas commented Jan 25, 2019

Filter out numerals, acronyms etc from word list for pattern hyhpenation (Bugzilla Bug 2537) #35

Filter out numerals, acronyms etc from word list for pattern hyhpenation (Bugzilla Bug 2537) #35

Comments

albbas commented Jan 25, 2019

Bugzilla Bug 2537

albbas commented Jan 25, 2019

Comment 13127

albbas commented Jan 25, 2019

Comment 13128

albbas commented Jan 25, 2019

Comment 13129