Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve language detection #676

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PetrosVav
Copy link

The language detection module takes into consideration only the first 30 second segment of the audio, disregarding that it may be silent, or too noisy. In such a scenario, the model may detect an arbitrary language. To overcome this problem, I propose an additional functionality that takes into consideration at least one audio segment and accepts the two following parameters:

  • language_detection_segments: int (>= 1)
  • language_threshold: float ([0,1])

The first parameter specifies how many segments of the audio to be taken into account (min: 1, max: full audio). The latter sets a threshold that if it is lower than the maximum probability of the language tokens, considers the language detected. If it fails to recognize the language for all the specified segments, because, either the max probability of all languages in the segments are lower than the threshold or the threshold is not specified (None), then perform a majority voting on the segment languages in order to decide the language.

The previous behavior, i.e. max probability of the first 30 seconds language tokens, is achieved by setting the parameters to:

model.transcribe(audio_path, language_detection_segments=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant