Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate automated speech recognition transcription as pre-processor option #1

Closed
chrisbrickhouse opened this issue Oct 26, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@chrisbrickhouse
Copy link
Member

OpenAI has whisper a freely licensed, open source text-to-speech software. A major part of the sociolinguistic pipeline is transcription, and creating an interface for an automated transcription service would be a useful feature. It's MIT licensed so we can distribute it with FAVE and any GPL preprocessor.

@chrisbrickhouse chrisbrickhouse added the enhancement New feature or request label Oct 26, 2022
@JoFrhwld
Copy link
Member

JoFrhwld commented Oct 27, 2022

My brief experiment with it was very impressive, but does run slowly on CPU. It doesn’t do speaker diarization, so combining it with something like pyannote would be necessary for the socio context. https://github.com/pyannote/pyannote-audio

@JoFrhwld
Copy link
Member

nevermind pyannote/pyannote-audio#1128 (comment)

@chrisbrickhouse
Copy link
Member Author

Slowly on CPU is fine, I think. The alignment also takes a while for large files, so users shouldn't be surprised if this step isn't fast. Plus, it's likely far faster than paying an RA to transcribe. Diarization is a big issue, and I think we can still use pyannote.

Unfortunate that model access is gated, but I don't think it's a blocker. The model itself is listed as MIT licensed, so we should be able to redistribute it freely under those terms if we obtain a copy. We could email them and ask for clarification on this point, as the only mention of that license is in the README metadata (this is why you always distribute the license text with the software).

@JoFrhwld JoFrhwld closed this as completed Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants