You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, it seems that intermediate files(.wav) are generated uniformly before the all transcription task starts, and cleared after all transcription tasks are completed.
This results in huge temporary disk usage.
In my example, the task of transcribing of 7.8GB MP4 audios (total 120 files) generates about 50GB intermediate files, which is unfriendly to the gpu server in the cloud environment.
The text was updated successfully, but these errors were encountered:
I'm working on refactoring the intermediate file generation process, which is currently buggy and poorly structured.
During the refactor:
I will maintain the lazy generation strategy. The project's time bottleneck lies in transcription and translation, so a consumer-worker pattern is used to mitigate this. All intermediate files (preprocessing) should be generated before transcription since both require CPU usage.
I will consider clearing these files as soon as transcription is completed.
I expect to finish the refactor by late July. Feel free to submit a PR if you want to help speed up the process.
Currently, it seems that intermediate files(.wav) are generated uniformly before the all transcription task starts, and cleared after all transcription tasks are completed.
This results in huge temporary disk usage.
In my example, the task of transcribing of 7.8GB MP4 audios (total 120 files) generates about 50GB intermediate files, which is unfriendly to the gpu server in the cloud environment.
The text was updated successfully, but these errors were encountered: