tag_test and escaped xfst tags #71

trondtynnol · 2025-02-01T16:15:29Z

tag_test is failing for me with the message

FAIL: tag_test.sh
=================

grep: (standard input): binary file matches
FAIL: Have a look at these:
+Use%/GC
FAIL tag_test.sh (exit status: 1)

I found the offending tag +Use%/GC in shared-smi

$ rg "\+Use%/GC" shared-*
shared-smi/src/fst/stems/arabic_roman_digits.lexc
205:< [1|2|3|4|5|6|7|8|9|%0] %+Use%/GC:0 >        MEASUREMENTS     ; ! gc needs measurements after arabic loops

where it appears in embedded xfst in lexc, meaning the actual tag is +Use/GC. Could tag_test be adapted to handle this?

The text was updated successfully, but these errors were encountered:

snomos · 2025-02-01T17:18:22Z

The tag test is too fragile in two different ways:

it uses declared tags in root.lexc instead of extracting all tags from the compiled lexical fst
it uses a simple diff, although the only relevant difference is one where there are tags in use that are not defined, not the other way around: defined tags that are NOT in use.

Fixing any of these would solve your issue, and fixing both would make the tag test much more robust.

@flammie could you have a look?

flammie · 2025-02-03T12:15:12Z

Yeah tag_test.sh and related extract scripts are quite hacky, I've patched a few of the escapes in now.

The problem that undeclared and typoed multichars compile into one arc per byte kind of paths cannot really be figured out from binary fst. It's a design failure in lexc that can ultimately only be fixed by rethinking the alphabet handling over all tools. The best that can be done finding misspelt and undeclared tags from lexc entries is by guessing that +anything is a tag by convention.

snomos · 2025-02-03T12:26:43Z

The best that can be done finding misspelt and undeclared tags from lexc entries is by guessing that +anything is a tag by convention.

Yes, and that is essentially what we already do. We can still improve the non-guessing part of the tag test, and that is what I suggest we do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tag_test and escaped xfst tags #71

tag_test and escaped xfst tags #71

trondtynnol commented Feb 1, 2025

snomos commented Feb 1, 2025

flammie commented Feb 3, 2025

snomos commented Feb 3, 2025

tag_test and escaped xfst tags #71

tag_test and escaped xfst tags #71

Comments

trondtynnol commented Feb 1, 2025

snomos commented Feb 1, 2025

flammie commented Feb 3, 2025

snomos commented Feb 3, 2025