Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loose coupling transcription and translation steps #59

Open
MaleicAcid opened this issue Nov 3, 2024 · 1 comment
Open

loose coupling transcription and translation steps #59

MaleicAcid opened this issue Nov 3, 2024 · 1 comment

Comments

@MaleicAcid
Copy link
Contributor

MaleicAcid commented Nov 3, 2024

The steps of transcription and translation currently appear to be relatively tightly coupled. We can see that the subtitles generated by the transcription are processed in the translation step.

# file: openlrc/openlrc.py
def process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans):
    ...
    if skip_trans:
        shutil.copy(transcribed_opt_sub.filename, final_json_path)
        transcribed_opt_sub.filename = final_json_path
        return transcribed_opt_sub
   ...

And finally generated in translation worker.

def translation_worker(self, transcription_queue, target_lang, skip_trans, bilingual_sub):
       ...
        # Handle translation
        final_subtitle = process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans)

        # Generate and move subtitle files
        generate_subtitle_files(final_subtitle, base_name, subtitle_format)
       ...

This seems to violate the SRP.
At the same time, even specified skip trans=True , the translation thread will still be started. Users pay for the additional performance overhead even though they are not using it.

I wish we could decouple the two steps of transcription and translation:
- The translation step no longer processes transcribed files.
- The translation thread is no longer started when skip_trans=False is specified.

I am not familiar with nlp related knowledge. But if you agree, maybe I can try to complete this improvement.

@zh-plus
Copy link
Owner

zh-plus commented Nov 4, 2024

Thank you for this detailed analysis.

I do think the structure of openlrc.py is not good - the tight coupling between transcription and translation makes the code less maintainable and less efficient. I actually started addressing some architectural issues in commit c7db967, but there's still room for improvement.

Feel free to open a PR with your proposed changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants