-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid unicode characters break transcribe #92
Comments
Just some idea about how to fix it by passing the raw byte array back and use python to decode instead, I am not sure if I should open a PR with this change since I am not familar with C and pybind11 https://github.com/absadiki/pywhispercpp/compare/main...andrewtheguy:pywhispercpp:unicode-handling?expand=1 |
Can you provide a sample file that raises this exception ? But Yes please, go ahead and open a new PR. The changes look good, just squash the commits into one or rebase the branch to clean the commits history, plus make sure to ignore the .vscode config files before committing. |
resolved by #93 |
When it encounters invalid unicode characters, it breaks transcription without a way to workaround it such as replacing invalid unicode with https://codepoints.net/U+FFFD:
When it crashes, it hangs the program instead of being able to continue.
The text was updated successfully, but these errors were encountered: