You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenAI has whisper a freely licensed, open source text-to-speech software. A major part of the sociolinguistic pipeline is transcription, and creating an interface for an automated transcription service would be a useful feature. It's MIT licensed so we can distribute it with FAVE and any GPL preprocessor.
The text was updated successfully, but these errors were encountered:
My brief experiment with it was very impressive, but does run slowly on CPU. It doesn’t do speaker diarization, so combining it with something like pyannote would be necessary for the socio context. https://github.com/pyannote/pyannote-audio
Slowly on CPU is fine, I think. The alignment also takes a while for large files, so users shouldn't be surprised if this step isn't fast. Plus, it's likely far faster than paying an RA to transcribe. Diarization is a big issue, and I think we can still use pyannote.
Unfortunate that model access is gated, but I don't think it's a blocker. The model itself is listed as MIT licensed, so we should be able to redistribute it freely under those terms if we obtain a copy. We could email them and ask for clarification on this point, as the only mention of that license is in the README metadata (this is why you always distribute the license text with the software).
OpenAI has whisper a freely licensed, open source text-to-speech software. A major part of the sociolinguistic pipeline is transcription, and creating an interface for an automated transcription service would be a useful feature. It's MIT licensed so we can distribute it with FAVE and any GPL preprocessor.
The text was updated successfully, but these errors were encountered: