Improve language detection #676

PetrosVav · 2022-12-13T15:06:41Z

The language detection module takes into consideration only the first 30 second segment of the audio, disregarding that it may be silent, or too noisy. In such a scenario, the model may detect an arbitrary language. To overcome this problem, I propose an additional functionality that takes into consideration at least one audio segment and accepts the two following parameters:

language_detection_segments: int (>= 1)
language_threshold: float ([0,1])

The first parameter specifies how many segments of the audio to be taken into account (min: 1, max: full audio). The latter sets a threshold that if it is lower than the maximum probability of the language tokens, considers the language detected. If it fails to recognize the language for all the specified segments, because, either the max probability of all languages in the segments are lower than the threshold or the threshold is not specified (None), then perform a majority voting on the segment languages in order to decide the language.

The previous behavior, i.e. max probability of the first 30 seconds language tokens, is achieved by setting the parameters to:

model.transcribe(audio_path, language_detection_segments=1)

feat: improve language detection

ef14efd

PetrosVav force-pushed the improve_language_detection branch from ab45db6 to ef14efd Compare March 24, 2023 14:03

ab-pandey mentioned this pull request May 30, 2023

Improve Language detection SYSTRAN/faster-whisper#265

Open

trungkienbkhn mentioned this pull request Mar 4, 2024

Improve language detection SYSTRAN/faster-whisper#732

Merged

kenho211 mentioned this pull request Jan 7, 2025

change language_detection_threshold type to float SYSTRAN/faster-whisper#1188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve language detection #676

Improve language detection #676

PetrosVav commented Dec 13, 2022

Improve language detection #676

Are you sure you want to change the base?

Improve language detection #676

Conversation

PetrosVav commented Dec 13, 2022