Integrate automated speech recognition transcription as pre-processor option #1

chrisbrickhouse · 2022-10-26T23:30:20Z

OpenAI has whisper a freely licensed, open source text-to-speech software. A major part of the sociolinguistic pipeline is transcription, and creating an interface for an automated transcription service would be a useful feature. It's MIT licensed so we can distribute it with FAVE and any GPL preprocessor.

JoFrhwld · 2022-10-27T15:07:26Z

My brief experiment with it was very impressive, but does run slowly on CPU. It doesn’t do speaker diarization, so combining it with something like pyannote would be necessary for the socio context. https://github.com/pyannote/pyannote-audio

JoFrhwld · 2022-10-29T22:17:50Z

nevermind pyannote/pyannote-audio#1128 (comment)

chrisbrickhouse · 2022-10-31T20:13:37Z

Slowly on CPU is fine, I think. The alignment also takes a while for large files, so users shouldn't be surprised if this step isn't fast. Plus, it's likely far faster than paying an RA to transcribe. Diarization is a big issue, and I think we can still use pyannote.

Unfortunate that model access is gated, but I don't think it's a blocker. The model itself is listed as MIT licensed, so we should be able to redistribute it freely under those terms if we obtain a copy. We could email them and ask for clarification on this point, as the only mention of that license is in the README metadata (this is why you always distribute the license text with the software).

chrisbrickhouse added the enhancement New feature or request label Oct 26, 2022

JoFrhwld closed this as completed Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate automated speech recognition transcription as pre-processor option #1

Integrate automated speech recognition transcription as pre-processor option #1

chrisbrickhouse commented Oct 26, 2022

JoFrhwld commented Oct 27, 2022 •

edited

Loading

JoFrhwld commented Oct 29, 2022

chrisbrickhouse commented Oct 31, 2022

Integrate automated speech recognition transcription as pre-processor option #1

Integrate automated speech recognition transcription as pre-processor option #1

Comments

chrisbrickhouse commented Oct 26, 2022

JoFrhwld commented Oct 27, 2022 • edited Loading

JoFrhwld commented Oct 29, 2022

chrisbrickhouse commented Oct 31, 2022

JoFrhwld commented Oct 27, 2022 •

edited

Loading