Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German Special Characters #13

Open
JoHoenk opened this issue Dec 19, 2024 · 3 comments
Open

German Special Characters #13

JoHoenk opened this issue Dec 19, 2024 · 3 comments

Comments

@JoHoenk
Copy link

JoHoenk commented Dec 19, 2024

Hi Guys,

first of all: awesome work here!

Currently I´m facing following issue with the special character 'ß' in sentence template:

brightness_numbers:
    values:
      - in: fünf
        out: 5
      - in: zehn
        out: 10
      - in: fünfzehn
        out: 15
      - in: zwanzig
        out: 20
      - in: fünfundzwanzig
        out: 25
      - in: dreißig
        out: 30
      - in: fünfunddreißig
        out: 35

When I run Vosk I get the following debug log entry, and the word recognition for 30 is also not working:
vosk | WARNING (VoskAPI:UpdateGrammarFst():recognizer.cc:308) Ignoring word missing in vocabulary: 'dreissig'

However if I look in the dictionary on Huggingface it looks like the work dreißig is included in the model.
So it seems like something is replacing the ß. I already tried out different quotes but I always get this warning.

Does anybody have a hint for me?

@JoHoenk JoHoenk changed the title German Special Characters in German Special Characters Dec 19, 2024
@Scrath1
Copy link

Scrath1 commented Jan 10, 2025

I experienced the same issue. The way I fixed it for me is the same way that the lumos example is described in the README file.
I took a look at the vocabulary for the german model (see here) and used other words or syllables to sound out the word.

For your example:

      - in: drei zig
        out: 30
      - in: fünf und drei zig
        out: 35

I do however still experience the same problem with the word "schließe" for which I haven't found a proper substitute yet.

@synesthesiam
Copy link
Contributor

I think I see what's happening. I'm using Python's casefold by default, which turns "dreißig" into "dreissig". I think I should be using lower instead here. What do you think?

@JoHoenk
Copy link
Author

JoHoenk commented Jan 11, 2025

That sounds right. This would probably solve the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants